Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C906611030 for ; Fri, 21 Feb 2014 14:35:13 +0000 (UTC) Received: (qmail 47777 invoked by uid 500); 21 Feb 2014 14:35:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47387 invoked by uid 500); 21 Feb 2014 14:35:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47379 invoked by uid 99); 21 Feb 2014 14:35:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 14:35:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 209.85.212.175 as permitted sender) Received: from [209.85.212.175] (HELO mail-wi0-f175.google.com) (209.85.212.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 14:35:03 +0000 Received: by mail-wi0-f175.google.com with SMTP id hm4so897968wib.2 for ; Fri, 21 Feb 2014 06:34:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ewSCCe2SpJ6FEd+W9UW5BjbV31J6pthy3TSlFck1xjQ=; b=EZx6Mf4qNIATZqH0MXj2pJupom5gY4Y2IPruZUYLwp4Ge9xFrMmaVJYAvhSwUeulO1 QmH7Z18kIhSmHDswzm2U3x55XErUFjs0fIJw00tCrzHo7509265JbbabU8cYdnT6F7Fc bNjy+D+ZM/kqPUTxjZR+J8ymsWOsghe8oqe49ZLeZLeZV62F+3baOYTj82x0qAkUdCjH N010Y4qxIoWkSRl/gqX3+dDnw/83O/XDn/dhu/qABzZCqOZmoOtfh7CZ1FihtrodnO2V yX8LYj3fs2c+t8z+McF9Ty0+IBAEva1ikFeG2A1UJRZ2kFx5HQqKz99lg3I0vUrVk+Rz hfAQ== MIME-Version: 1.0 X-Received: by 10.194.2.110 with SMTP id 14mr7214068wjt.96.1392993282568; Fri, 21 Feb 2014 06:34:42 -0800 (PST) Received: by 10.194.220.105 with HTTP; Fri, 21 Feb 2014 06:34:42 -0800 (PST) In-Reply-To: References: Date: Fri, 21 Feb 2014 09:34:42 -0500 Message-ID: Subject: Re: Performance problem with large wide row inserts using CQL From: Edward Capriolo To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=047d7b33db86c8350904f2eb857e X-Virus-Checked: Checked by ClamAV on apache.org --047d7b33db86c8350904f2eb857e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The main issue is that cassandra has two of everything. Two access apis, two meta data systems, and two groups of users. Those groups of users using the original systems thrift, cfmetadata, and following the advice of three years ago have been labled obsolete (did you ever see that twighlight zone episode?). If you suggest a thrift only feature get ready to fight. People seem oblivious to the fact that you may have a 38 node cluster with 12 tb of data under compact storage, and that you can't just snap your fingers and adopt whatever new system to pack data that someone comes up with. Earlier in the thread I detailed a potential way to store collection like things in compact storage. You would just assume that with all the collective brain power in the project, that somehow, some way collections could make their way into compact storage. Or the new language would offer similiar features regardless of storage chosen (say like innodb and mariadb). The shelf life of codd's normal form has been what? 30 or 40 years and still going strong? Im always rather pisswd that 3 years after i start using cassandra everything has changed, that im not the future, and that no one is really interested in supporting anything i used the datastore for. On Friday, February 21, 2014, Sylvain Lebresne wrote= : > On Thu, Feb 20, 2014 at 10:49 PM, R=FCdiger Klaehn wrote: >> >> Hi Sylvain, >> >> I applied the patch to the cassandra-2.0 branch (this required some manual work since I could not figure out which commit it was supposed to apply for, and it did not apply to the head of cassandra-2.0). > > Yeah, some commit yesterday made the patch not apply cleanly anymore. In any case, It's not committed to the cassandra-2.0 branch and will be part of 2.0.6. >> >> The benchmark now runs in pretty much identical time to the thrift based benchmark. ~30s for 1000 inserts of 10000 key/value pairs each. Great work! > > Glad that it helped. > >> >> I still have some questions regarding the mapping. Please bear with me if these are stupid questions. I am quite new to Cassandra. >> >> The basic cassandra data model for a keyspace is something like this, right? >> >> SortedMap> >> ^ row key. determines which server(s) the rest is stored on >> ^ column key >> ^ timestamp (latest one wins) >> ^ value (can be size 0) > > It's a reasonable way to think of how things are stored internally, yes. Though as DuyHai mentioned, the first map is really sorting by token and in general that means you use mostly the sorting of the second map concretely. > >> >> So if I have a table like the one in my benchmark (using blobs) >> >> CREATE TABLE IF NOT EXISTS test.wide ( >> time blob, >> name blob, >> value blob, >> PRIMARY KEY (time,name)) >> WITH COMPACT STORAGE >> >> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems that >> >> - time maps to the row key and name maps to the column key without any overhead >> - value directly maps to value in the model above without any prefix >> >> is that correct, or is there some overhead involved in CQL over the raw model as described above? If so, where exactly? > > That's correct. > For completeness sake, if you were to remove the COMPACT STORAGE, there would be some overhead in how it maps to the underlying column key, but that overhead would buy you much more flexibility in how you could evolve this table schema (you could add more CQL columns later if needs be, have collections or have static columns following CASSANDRA-6561 that comes in 2.0.6; none of which you can have with COMPACT STORAGE). Note that it's perfectly fine to use COMPACT STORAGE if you know you don't and won't need the additional flexibility, but I generally advise people to actually check first that using COMPACT STORAGE does make a concrete and meaningful difference for their use case (be careful with premature optimization really). The difference in performance/storage space used is not always all that noticeable in practice (note that I didn't said it's never noticeable!) and is narrowing with Cassandra evolution (it's not impossible at all that we will get to "never noticeable" someday, while COMPACT STORAGE tables will never get the flexibility of normal tables because there is backwards compatibility issues). It's also my experience that more often that not (again, not always), flexibility turns out to be more important that squeezing every bit of performance you can (if it comes at the price of that flexibility that is) in the long run. Do what you want of that advise :) > -- > Sylvain > >> >> kind regards and many thanks for your help, >> >> R=FCdiger >> >> >> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne wrote: >>> >>> >>> >>> On Wed, Feb 19, 2014 at 9:38 PM, R=FCdiger Klaehn wrote: >>>> >>>> I have cloned the cassandra repo, applied the patch, and built it. But when I want to run the bechmark I get an exception. See below. I tried with a non-managed dependency to cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I compiled from source because I read that that might help. But that did not make a difference. >>>> >>>> So currently I don't know how to give the patch a try. Any ideas? >>>> >>>> cheers, >>>> >>>> R=FCdiger >>>> >>>> Exception in thread "main" java.lang.IllegalArgumentException: replicate_on_write is not a column defined in this metadata >>>> at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java= :273) >>>> at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.ja= va:279) >>>> at com.datastax.driver.core.Row.getBool(Row.java:117) >>>> at com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:47= 4) >>>> at com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107) >>>> at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128) >>>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89) >>>> at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.= java:259) >>>> at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.jav= a:214) >>>> at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnect= ion.java:161) >>>> at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:7= 7) >>>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890) >>>> at com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910) >>>> at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806) >>>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158) >>>> at cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimi= zed.scala:31) >>>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40) >>>> at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) >>>> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForw= arder.scala:32) >>>> at scala.App$class.main(App.scala:71) >>>> at cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5) >>>> at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala) >>> >>> I believe you've tried the cassandra trunk branch? trunk is basically the future Cassandra 2.1 and the driver is currently unhappy because the replicate_on_write option has been removed in that version. I'm supposed to have fixed that on the driver 2.0 branch like 2 days ago so maybe you're also using a slightly old version of the driver sources in there? Or maybe I've screwed up my fix, I'll double check. But anyway, it would be overall simpler to test with the cassandra-2.0 branch of Cassandra, with which you shouldn't run into that. >>> -- >>> Sylvain > > --=20 Sorry this was sent from mobile. Will do less grammar and spell check than usual. --047d7b33db86c8350904f2eb857e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The main issue is that cassandra has two of everything. Two access apis, tw= o meta data systems, and two groups of users.

Those groups of users= using the original systems thrift, cfmetadata, and following the advice of= three years ago have been labled obsolete (did you ever see that twighligh= t zone episode?).

If you suggest a thrift only feature get ready to fight. People seem ob= livious to the fact that you may have a 38 node cluster with 12 tb of data = under compact storage, and that you can't just snap your fingers and ad= opt whatever new system to pack data that someone comes up with.

Earlier in the thread I detailed a potential way to store collection li= ke things in compact storage. You would just assume that with all the colle= ctive brain power in the project, that somehow, some way collections could = make their way into compact storage. Or the new language would offer simili= ar features regardless of storage chosen (say like innodb and mariadb).
The shelf life of codd's normal form has been what? 30 or 40 years = and still going strong? Im always rather pisswd that 3 years after i start = using cassandra everything has changed, that im not the future, and that no= one is really interested in supporting anything i used the datastore for.<= br>

On Friday, February 21, 2014, Sylvain Lebresne <sylvain@datastax.com> wrote:
> On Thu,= Feb 20, 2014 at 10:49 PM, R=FCdiger Klaehn <rklaehn@gmail.com> wrote:
>>
>> Hi Sylvain,
>>
>> I applied the patc= h to the cassandra-2.0 branch (this required some manual work since I could= not figure out which commit it was supposed to apply for, and it did not a= pply to the head of cassandra-2.0).
>
> Yeah, some commit yesterday made the patch not apply cleanly a= nymore. In any case, It's not committed to the cassandra-2.0 branch and= will be part of 2.0.6.
>>
>> The benchmark now runs in p= retty much identical time to the thrift based benchmark. ~30s for 1000 inse= rts of 10000 key/value pairs each. Great work!
>
> Glad that it helped.
> =A0
>>
>> I sti= ll have some questions regarding the mapping. Please bear with me if these = are stupid questions. I am quite new to Cassandra.
>>
>> = The basic cassandra data model for a keyspace is something like this, right= ?
>>
>> SortedMap<byte[], SortedMap<byte[], Pair<Long= , byte[]>>
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 ^ row key. determines which server(s) the rest is stored on
>>= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ^ column key >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ^ timestamp (latest one wins)=
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 ^ value (can be size 0)
>
> It's a reasonable way to think of how things are stored internally= , yes. Though as DuyHai mentioned, the first map is really sorting by token= and in general that means you use mostly the sorting of the second map con= cretely.
> =A0
>>
>> So if I have a table like the one in my be= nchmark (using blobs)
>>
>> CREATE TABLE IF NOT EXISTS te= st.wide (
>> time blob,
>> name blob,
>> value b= lob,
>> PRIMARY KEY (time,name))
>> WITH COMPACT STORAGE
>&= gt;
>> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems = that
>>
>> - time maps to the row key and name maps to the column= key without any overhead
>> - value directly maps to value in the= model above without any prefix
>>
>> is that correct, or= is there some overhead involved in CQL over the raw model as described abo= ve? If so, where exactly?
>
> That's correct.
> For completeness sake, if you were= to remove the COMPACT STORAGE, there would be some overhead in how it maps= to the underlying column key, but that overhead would buy you much more fl= exibility in how you could evolve this table schema (you could add more CQL= columns later if needs be, have collections or have static columns followi= ng CASSANDRA-6561 that comes in 2.0.6; none of which you can have with COMP= ACT STORAGE). Note that it's perfectly fine to use COMPACT STORAGE if y= ou know you don't and won't need the additional flexibility, but I = generally advise people to actually check first that using COMPACT STORAGE = does make a concrete and meaningful difference for their use case (be caref= ul with premature optimization really). The difference in performance/stora= ge space used is not always all that noticeable in practice (note that I di= dn't said it's never noticeable!) and is narrowing with Cassandra e= volution (it's not impossible at all that we will get to "never no= ticeable" someday, while COMPACT STORAGE tables will never get the fle= xibility of normal tables because there is backwards compatibility issues).= It's also my experience that more often that not (again, not always), = flexibility turns out to be more important that squeezing every bit of perf= ormance you can (if it comes at the price of that flexibility that is) in t= he long run. Do what you want of that advise :) =A0
> --
> Sylvain
> =A0
>>
>> kind regards an= d many thanks for your help,
>>
>> R=FCdiger
>><= br>>>
>> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne &= lt;sylvain@datastax.com> wro= te:
>>>
>>>
>>>
>>> On Wed, Feb 19= , 2014 at 9:38 PM, R=FCdiger Klaehn <rklaehn@gmail.com> wrote:
>>>>
>>>> = I have cloned the cassandra repo, applied the patch, and built it. But when= I want to run the bechmark I get an exception. See below. I tried with a n= on-managed dependency to cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-= dependencies.jar, which I compiled from source because I read that that mig= ht help. But that did not make a difference.
>>>>
>>>> So currently I don't know how to g= ive the patch a try. Any ideas?
>>>>
>>>> che= ers,
>>>>
>>>> R=FCdiger
>>>><= br> >>>> Exception in thread "main" java.lang.IllegalArgu= mentException: replicate_on_write is not a column defined in this metadata<= br>>>>> =A0=A0=A0 at com.datastax.driver.core.ColumnDefinitions= .getAllIdx(ColumnDefinitions.java:273)
>>>> =A0=A0=A0 at com.datastax.driver.core.ColumnDefinitions.ge= tFirstIdx(ColumnDefinitions.java:279)
>>>> =A0=A0=A0 at com.= datastax.driver.core.Row.getBool(Row.java:117)
>>>> =A0=A0= =A0 at com.datastax.driver.core.TableMetadata$Options.<init>(TableMet= adata.java:474)
>>>> =A0=A0=A0 at com.datastax.driver.core.TableMetadata.build(= TableMetadata.java:107)
>>>> =A0=A0=A0 at com.datastax.drive= r.core.Metadata.buildTableMetadata(Metadata.java:128)
>>>> = =A0=A0=A0 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:= 89)
>>>> =A0=A0=A0 at com.datastax.driver.core.ControlConnection.re= freshSchema(ControlConnection.java:259)
>>>> =A0=A0=A0 at co= m.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:= 214)
>>>> =A0=A0=A0 at com.datastax.driver.core.ControlConnection.re= connectInternal(ControlConnection.java:161)
>>>> =A0=A0=A0 a= t com.datastax.driver.core.ControlConnection.connect(ControlConnection.java= :77)
>>>> =A0=A0=A0 at com.datastax.driver.core.Cluster$Manager.init= (Cluster.java:890)
>>>> =A0=A0=A0 at com.datastax.driver.cor= e.Cluster$Manager.newSession(Cluster.java:910)
>>>> =A0=A0= =A0 at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806= )
>>>> =A0=A0=A0 at com.datastax.driver.core.Cluster.connect(Clus= ter.java:158)
>>>> =A0=A0=A0 at cassandra.CassandraTestMinim= ized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>>= ;> =A0=A0=A0 at scala.Function0$class.apply$mcV$sp(Function0.scala:40) >>>> =A0=A0=A0 at scala.runtime.AbstractFunction0.apply$mcV$sp(= AbstractFunction0.scala:12)
>>>> =A0=A0=A0 at scala.App$$ano= nfun$main$1.apply(App.scala:71)
>>>> =A0=A0=A0 at scala.App$= $anonfun$main$1.apply(App.scala:71)
>>>> =A0=A0=A0 at scala.collection.immutable.List.foreach(List.= scala:318)
>>>> =A0=A0=A0 at scala.collection.generic.Traver= sableForwarder$class.foreach(TraversableForwarder.scala:32)
>>>= > =A0=A0=A0 at scala.App$class.main(App.scala:71)
>>>> =A0=A0=A0 at cassandra.CassandraTestMinimized$.main(Cassan= draTestMinimized.scala:5)
>>>> =A0=A0=A0 at cassandra.Cassan= draTestMinimized.main(CassandraTestMinimized.scala)
>>>
>= >> I believe you've tried the cassandra trunk branch? trunk is ba= sically the future Cassandra 2.1 and the driver is currently unhappy becaus= e the replicate_on_write option has been removed in that version. I'm s= upposed to have fixed that on the driver 2.0 branch like 2 days ago so mayb= e you're also using a slightly old version of the driver sources in the= re? Or maybe I've screwed up my fix, I'll double check. But anyway,= it would be overall simpler to test with the cassandra-2.0 branch of Cassa= ndra, with which you shouldn't run into that.
>>> --
>>> Sylvain
>
>

--
Sorry= this was sent from mobile. Will do less grammar and spell check than usual= .
--047d7b33db86c8350904f2eb857e--