Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 825F710C4E for ; Fri, 30 Aug 2013 11:56:28 +0000 (UTC) Received: (qmail 52305 invoked by uid 500); 30 Aug 2013 11:56:25 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52134 invoked by uid 500); 30 Aug 2013 11:56:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52126 invoked by uid 99); 30 Aug 2013 11:56:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 11:56:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sylvain@datastax.com designates 209.85.160.42 as permitted sender) Received: from [209.85.160.42] (HELO mail-pb0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 11:56:18 +0000 Received: by mail-pb0-f42.google.com with SMTP id un15so1790679pbc.29 for ; Fri, 30 Aug 2013 04:55:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=0Im7aDVfVb8WhdLGMarjCVyUQAw82JHOLODO4eRcyKg=; b=Mdh4p+EBdCqRZR1rbW2rEbVt7/2Tt4EUhwxaFgtlR7tXBD1s5dDrj2kn5BI9l8Jata CC0Y70tURgsWWpveepeVPjylHZRUbjsWyhAIimA1aTDEiE0qPVBWop+1B+/fkBUUsnsR 3E+cdAeSvu1NRotyF7JopJ3j9KAtHRaJVnLLz44JcfO8+rsVV7fMSW6LF2dGeHx+GO5h V+3WaMZmgsrJkLUPxSHGjCs+FrHoXwqzpSSqiG40XOtbfrOrM5mARpxzoMzMp08bfOJr OjpKXaD4XPGwA2z3rg91PxOaCXnsJHqA+jb13b/G/3rYw59LXbnlOMR2SMRMA84TSH2t QlVg== X-Gm-Message-State: ALoCoQnRToSUG5Rbs7MmEJABTIwBrfv1k5OkXCFysOubMB2aYQ9qb6UPAoKL26IUuSkrgej0BCMG MIME-Version: 1.0 X-Received: by 10.68.232.6 with SMTP id tk6mr9566905pbc.14.1377863756852; Fri, 30 Aug 2013 04:55:56 -0700 (PDT) Received: by 10.68.2.36 with HTTP; Fri, 30 Aug 2013 04:55:56 -0700 (PDT) In-Reply-To: References: Date: Fri, 30 Aug 2013 13:55:56 +0200 Message-ID: Subject: Re: mysterious 'column1' in cql describe From: Sylvain Lebresne To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=047d7b33d152c6c7f404e528e7c2 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b33d152c6c7f404e528e7c2 Content-Type: text/plain; charset=ISO-8859-1 > Why does the explicit definition of columns in a column family > significantly improve performance and key cache hit ratio (the last one > being almost zero when there are no explicit column definitions)? > It doesn't, not in itself at least. So something else has changed or something is wrong in your comparison of before/after. But it's hard to say without at least a minimum of information on how you actually observed such "significant performance improvement" (which queries for instance). As for the key cache hit rate, adding a column definition certainly have no effect on it in itself. But defining a new 2ndary index might, and the code to add the column you've provided does has a setIndexType. Again, hard to be definitive on that because the code you've show set a CUSTOM index type without providing any indexOption, which is *invalid* (and rejected as so by Cassandra). So either the code above is not complete, or it's not the one you've used, or Hector is doing some weird stuff behind your back. In any case, if index creation there has been, then *that* could easily explain a before-after performance difference. -- Sylvain > > > 2013/8/30 Sylvain Lebresne > >> The short story is that you're probably not up to date on how CQL and >> thrift table definition relate to one another, and that may not be exactly >> how you think it does. If you haven't done so, I'd suggest the reading of >> http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your "what about dynamic column name" case) and >> http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how >> CQL3 interprets thrift table, and why your saw what you saw). >> >> -- >> Sylvain >> >> >> On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev wrote: >> >>> Hi all! >>> >>> We have encountered the following problem. We create our column families >>> via hector like this: >>> >>> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* >>> "mykeyspace"*, *"mycf"*); >>> cfdef.setColumnType(ColumnType.*STANDARD*); >>> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); >>> cfdef.setDefaultValidationClass(*"BytesType"*); >>> cfdef.setKeyValidationClass(*"UTF8Type"*); >>> cfdef.setReadRepairChance(0.1); >>> cfdef.setGcGraceSeconds(864000); >>> cfdef.setMinCompactionThreshold(4); >>> cfdef.setMaxCompactionThreshold(32); >>> cfdef.setReplicateOnWrite(*true*); >>> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*); >>> Map compressionOptions = *new* HashMap>> String>(); >>> compressionOptions.put(*"sstable_compression"*, *""*); >>> cfdef.setCompressionOptions(compressionOptions); >>> cluster.addColumnFamily(cfdef, *true*); >>> >>> When we *describe *this column family via *cqlsh* we get this >>> >>> CREATE TABLE "mycf" ( >>> key text, >>> column1 text, >>> value blob, >>> PRIMARY KEY (key, column1) >>> ) WITH COMPACT STORAGE AND >>> bloom_filter_fp_chance=0.010000 AND >>> caching='KEYS_ONLY' AND >>> comment='' AND >>> dclocal_read_repair_chance=0.000000 AND >>> gc_grace_seconds=864000 AND >>> read_repair_chance=0.100000 AND >>> replicate_on_write='true' AND >>> populate_io_cache_on_flush='false' AND >>> compaction={'class': 'SizeTieredCompactionStrategy'} AND >>> compression={}; >>> >>> As you can see there is a mysterious *column1* and moreover it is added >>> to the primary key. We've thought it wrong so we've tried getting rid of >>> it. We've managed to do it by adding explicit column definitions like this: >>> >>> BasicColumnDefinition cdef = new BasicColumnDefinition(); >>> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*)); >>> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); >>> cdef.setIndexType(ColumnIndexType.*CUSTOM*); >>> cfdef.addColumnDefinition(cDef); >>> >>> After this the primary key was like >>> >>> PRIMARY KEY (key) >>> >>> The effect of this was *overwhelming* - we got a tremendous performance >>> improvement and according to stats, the key cache began working while >>> previously its hit ratio was close to zero. >>> >>> My questions are >>> >>> 1) What is this all about? Is what we did right? >>> 2) In this project we can provide explicit column definitions. But in >>> another project we have some column families where this is not possible >>> because column names are dynamic (based on timestamps). If what we did is >>> right - how can we adapt this solution to the dynamic column name case? >>> >> >> > --047d7b33d152c6c7f404e528e7c2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

=
Why does the explicit definition of columns in a colu= mn family significantly improve performance and key cache hit ratio (the la= st one being almost zero when there are no explicit column definitions)?

It doesn't, not in itself at lea= st. So something else has changed or something is wrong in your comparison = of before/after. But it's hard to say without at least a minimum of inf= ormation on how you actually observed such "significant performance im= provement" (which queries for instance).

As for the key cache hit rate, adding a column definiti= on certainly have no effect on it in itself. But defining a new 2ndary inde= x might, and the code to add the column you've provided does has a =A0s= etIndexType. Again, hard to be definitive on that because the code you'= ve show set a CUSTOM index type without providing any indexOption, which is= *invalid* (and rejected as so by Cassandra). So either the code above is n= ot complete, or it's not the one you've used, or Hector is doing so= me weird stuff behind your back. In any case, if index creation there has b= een, then *that* could easily explain a before-after performance difference= .

--
Sylvain

=A0
=


2013/8/30 Sylvain Lebresne <sylvain@datastax.com>
The short story is that you're probably not up to date= on how CQL and thrift table definition relate to one another, and that may= not be exactly how you think it does. If you haven't done so, I'd = suggest the reading of=A0http://www.datastax= .com/dev/blog/does-cql-support-dynamic-columns-wide-rows (should answer= your "what about dynamic column name" case) and=A0http://www.= datastax.com/dev/blog/thrift-to-cql3 (should help explain how CQL3 inte= rprets thrift table, and why your saw what you saw).

--
Sylvain


On Fri, Aug 30, 2013 at 9:50 AM, Alexa= nder Shutyaev <shutyaev@gmail.com> wrote:
Hi all!

We have encoun= tered the following problem. We create our column families via hector like = this:

ColumnFamilyDefinition cfdef =3D HFactory.createCo= lumnFamilyDefinition("mykeyspace", "mycf&qu= ot;);
cfdef.setColumnType(ColumnType.STANDARD);
cfde= f.setComparatorType(ComparatorType.UTF8TYPE);
cfdef= .setDefaultValidationClass("BytesType");
cfdef.setKeyValidationClass("UTF8Type");
= cfdef.setReadRepairChance(0.1);
cfdef.setGcGraceSeconds(864000);<= /div>
cfdef.setMinCompactionThreshold(4);
cfdef.setMaxCompact= ionThreshold(32);
cfdef.setReplicateOnWrite(true);
cfdef.setCompactionS= trategy("SizeTieredCompactionStrategy");
= Map<String, String> compressionOptions =3D new HashMap<Stri= ng, String>();
compressionOptions.put("sstable_compression", = "");
cfdef.setCompressionOptions(compress= ionOptions);
cluster.addColumnFamily(cfdef, true);

When we=A0describe this column family via cql= sh=A0we get this

CREATE TABLE "mycf&= quot; (
=A0 key text,
=A0 column1 text,
=A0 value blob,
=A0 PRIMARY KEY (key, column1)
) W= ITH COMPACT STORAGE AND
=A0 bloom_filter_fp_chance=3D0.010000 AND=
=A0 caching=3D'KEYS_ONLY' AND
=A0 comment=3D&#= 39;' AND
=A0 dclocal_read_repair_chance=3D0.000000 AND
=A0 gc_grace_s= econds=3D864000 AND
=A0 read_repair_chance=3D0.100000 AND
=A0 replicate_on_write=3D'true' AND
=A0 populate_io_ca= che_on_flush=3D'false' AND
=A0 compaction=3D{'class': 'SizeTieredCompactionStrategy&#= 39;} AND
=A0 compression=3D{};

As you ca= n see there is a mysterious column1=A0and moreover it is added to th= e primary key. We've thought it wrong so we've tried getting rid of= it. We've managed to do it by adding explicit column definitions like = this:

BasicColumnDefinition cdef =3D new BasicColumnDefi= nition();
cdef.setName(StringSerializer.get().toByteBuffer(= "mycolumn"));
cdef.setValidationClass(Comparato= rType.BYTESTYPE.getTypeName());
cdef.setIndexType(ColumnIndexType.CUSTOM);
cfdef.addC= olumnDefinition(cDef);

After this the primary key = was like

PRIMARY KEY (key)

The effect of this was overwhelming - we got a t= remendous performance improvement and according to stats, the key cache beg= an working while previously its hit ratio was close to zero.

My questions are

1) What is th= is all about? Is what we did right?
2) In this project we can pro= vide explicit column definitions. But in another project we have some colum= n families where this is not possible because column names are dynamic (bas= ed on timestamps). If what we did is right - how can we adapt this solution= to the dynamic column name case?



--047d7b33d152c6c7f404e528e7c2--