Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com
 designates 74.125.82.43 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKkz8Q0-OCS2aPf7efUTQ6mzKZ=61suk5j7kwhVUKr6_yL1CZA@mail.gmail.com>
References: 
 <CAGA++nkTyQXKCsBe=Dzjc8Cy4TpWwJc0ARY-0nLC0iqXdR0jfQ@mail.gmail.com>
	<CAKkz8Q2UwYy6AbRjFwoavFLcr5jEHELR-QYcmuXVJ7rG6nxwwQ@mail.gmail.com>
	<CAGA++nmaJJ4QT8ygMxQgALmqiEb_meW_wHKEcELM602Kyjvt6A@mail.gmail.com>
	<CAKkz8Q1UM92Ga0+eNaxV5KzXcS+hcSA87cCGc12LMP=CWCppog@mail.gmail.com>
	<CAENxBwwm9CRiOp_zs-2ru3bezTg5FznMxZGMt8GjBs-SDjghww@mail.gmail.com>
	<CAKkz8Q3Q8hfG1jQ23kj4GJM+i7xMiG56Oczm7PQ=w1rsjiOOtw@mail.gmail.com>
	<CAKv2g8fLCOPjwZ4bEsVsS6NTt2OrHGAAf72mFkeM+hZToRckgw@mail.gmail.com>
	<CAKkz8Q0-OCS2aPf7efUTQ6mzKZ=61suk5j7kwhVUKr6_yL1CZA@mail.gmail.com>
Date: Thu, 20 Feb 2014 14:31:46 -0500
Message-ID: 
 <CAENxBwzYBQcJRDYYuCZ15x1nVBCVoFxO9EitZtzNCfjfGanPmg@mail.gmail.com>
Subject: Re: Performance problem with large wide row inserts using CQL
From: Edward Capriolo <edlinuxguru@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=089e01176b1d4f480104f2db8eb3

--089e01176b1d4f480104f2db8eb3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

The only thing you really can not do CQL3 loses some of the concept of CQL2
metadata, namedly the default validation and then column specific
validation.

In cassandra-cql we can say (butchering the syntax)

create column family x
DEFAULT_VALIDATOR =3D UTF8Type
columns named y are int
columns named z are string

You can do this in CQL.:
create table x (
rowkey blob,
column blob,
value blob,
primary key(rowkey,column) using compact storage ;

But you lost the concept of columns named y validate as int. Everything is
just blob as far as CQL understands it. That being said in the schema
presented nothing stops the user from implementing their design in either
"system"


On Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne <sylvain@datastax.com>wr=
ote:

> On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin <woolfel@gmail.com> wrote:
>
>>
>> I disagree with the sentiment that "thrift is not worth the trouble".
>>
>
> Way to quote only part of my sentence and get mental on it. My full
> sentence was "it's probably not worth the trouble to start with thrift if
> you're gonna use CQL later".
>
>
>>
>> CQL and all SQL inspired dialects limit one's ability to use arbitrary
>> typed data in dynamic columns. With thrift it's easy and straight forwar=
d.
>> With CQL there is no way to tell Cassandra the type of the name and valu=
e
>> for a dynamic column. You can only set the default type. That means usin=
g a
>> "pure cql" approach you can deviate from the default type. Cassandra wil=
l
>> throw an exception indicating the type is different than the default typ=
e.
>>
>
>> Until such time that CQL abandons the shackles of SQL and adds the
>> ability to indicate the column and value type. Something like this
>>
>
>> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>> long)
>>
>> This is one area where Thrift is superior to CQL. Having said that, it's
>> valid to use Cassandra "as if" it was a relational database, but then yo=
u'd
>> miss out on some of the unique features.
>>
>
> Man, if I had a nickel every time someone came on that mailing list
> pretending that something was possible with thrift and not CQL ... I will
> claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, ther=
e
> is *nothing* that thrift can do that CQL cannot. But well, what do I know
> about Cassandra.
>
> --
> Sylvain
>
>
>
>>
>>
>>
>>
>> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne <sylvain@datastax.com=
>wrote:
>>
>>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo <edlinuxguru@gmail.com=
>wrote:
>>>
>>>> For what it is worth you schema is simple and uses compact storage.
>>>> Thus you really dont need anything in cassandra 2.0 as far as i can te=
ll.
>>>> You might be happier with a stable release like 1.2.something and just
>>>> hector or astyanax. You are really dealing with many issues you should=
 not
>>>> have to just to protoype a simple cassandra app.
>>>
>>>
>>>
>>> Of course, if everyone was using that reasoning, no-one would ever test
>>> new features and report problems/suggest improvement. So thanks to anyo=
ne
>>> like R=FCdiger that actually tries stuff and take the time to report pr=
oblems
>>> when they think they encounter one. Keep at it, *you* are the one helpi=
ng
>>> Cassandra to get better everyday.
>>>
>>> And you are also right R=FCdiger that it's probably not worth the troub=
le
>>> to start with thrift if you're gonna use CQL later. And you definitivel=
y
>>> should use CQL, it is Cassandra's future.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>
>>>>
>>>> On Thursday, February 20, 2014, Sylvain Lebresne <sylvain@datastax.com=
>
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Feb 19, 2014 at 9:38 PM, R=FCdiger Klaehn <rklaehn@gmail.com=
>
>>>> wrote:
>>>> >>
>>>> >> I have cloned the cassandra repo, applied the patch, and built it.
>>>> But when I want to run the bechmark I get an exception. See below. I t=
ried
>>>> with a non-managed dependency to
>>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, wh=
ich I
>>>> compiled from source because I read that that might help. But that did=
 not
>>>> make a difference.
>>>> >>
>>>> >> So currently I don't know how to give the patch a try. Any ideas?
>>>> >>
>>>> >> cheers,
>>>> >>
>>>> >> R=FCdiger
>>>> >>
>>>> >> Exception in thread "main" java.lang.IllegalArgumentException:
>>>> replicate_on_write is not a column defined in this metadata
>>>> >>     at
>>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions=
.java:273)
>>>> >>     at
>>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitio=
ns.java:279)
>>>> >>     at com.datastax.driver.core.Row.getBool(Row.java:117)
>>>> >>     at
>>>> com.datastax.driver.core.TableMetadata$Options.<init>(TableMetadata.ja=
va:474)
>>>> >>     at
>>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>>> >>     at
>>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128=
)
>>>> >>     at
>>>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>>> >>     at
>>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnec=
tion.java:259)
>>>> >>     at
>>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnectio=
n.java:214)
>>>> >>     at
>>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlCo=
nnection.java:161)
>>>> >>     at
>>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.j=
ava:77)
>>>> >>     at
>>>> com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>>> >>     at
>>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>>> >>     at
>>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>>> >>     at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>>> >>     at
>>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestM=
inimized.scala:31)
>>>> >>     at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>>> >>     at
>>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:1=
2)
>>>> >>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>>> >>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>>> >>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>> >>     at
>>>> scala.collection.generic.TraversableForwarder$class.foreach(Traversabl=
eForwarder.scala:32)
>>>> >>     at scala.App$class.main(App.scala:71)
>>>> >>     at
>>>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>>>> >>     at
>>>> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>>> >
>>>> > I believe you've tried the cassandra trunk branch? trunk is basicall=
y
>>>> the future Cassandra 2.1 and the driver is currently unhappy because t=
he
>>>> replicate_on_write option has been removed in that version. I'm suppos=
ed to
>>>> have fixed that on the driver 2.0 branch like 2 days ago so maybe you'=
re
>>>> also using a slightly old version of the driver sources in there? Or m=
aybe
>>>> I've screwed up my fix, I'll double check. But anyway, it would be ove=
rall
>>>> simpler to test with the cassandra-2.0 branch of Cassandra, with which=
 you
>>>> shouldn't run into that.
>>>> > --
>>>> > Sylvain
>>>>
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>> than usual.
>>>>
>>>
>>>
>>
>

--089e01176b1d4f480104f2db8eb3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div>The only=
 thing you really can not do CQL3 loses some of the concept of CQL2 metadat=
a, namedly the default validation and then column specific validation.<br>
</div></div></div></div></div><br></div>In cassandra-cql we can say (butche=
ring the syntax)<br><br></div>create column family x <br>DEFAULT_VALIDATOR =
=3D UTF8Type=A0 <br></div>columns named y are int<br></div>columns named z =
are string<br>
<br>You can do this in CQL.:<br>create table x (<br>rowkey blob,<br>column =
blob,<br>value blob, <br>primary key(rowkey,column) using compact storage ;=
 <br><br></div>But you lost the concept of columns named y validate as int.=
 Everything is just blob as far as CQL understands it. That being said in t=
he schema presented nothing stops the user from implementing their design i=
n either &quot;system&quot;<br>
<br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne <span dir=3D"ltr">&lt;<a hr=
ef=3D"mailto:sylvain@datastax.com" target=3D"_blank">sylvain@datastax.com</=
a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
<div class=3D"gmail_quote">On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin <span=
 dir=3D"ltr">&lt;<a href=3D"mailto:woolfel@gmail.com" target=3D"_blank">woo=
lfel@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><br></div>I disa=
gree with the sentiment that &quot;thrift is not worth the trouble&quot;.<b=
r>

</div></div></div></div></div></blockquote><div><br></div><div>Way to quote=
 only part of my sentence and get mental on it. My full sentence was &quot;=
it&#39;s probably not worth the trouble to start with thrift if you&#39;re =
gonna use CQL later&quot;.</div>

<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><br></d=
iv>
CQL and all SQL inspired dialects limit one&#39;s ability to use arbitrary =
typed data in dynamic columns. With thrift it&#39;s easy and straight forwa=
rd. With CQL there is no way to tell Cassandra the type of the name and val=
ue for a dynamic column. You can only set the default type. That means usin=
g a &quot;pure cql&quot; approach you can deviate from the default type. Ca=
ssandra will throw an exception indicating the type is different than the d=
efault type.=A0</div>

</div></div></div></blockquote><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204=
,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div><div>=
<div>


<br></div>Until such time that CQL abandons the shackles of SQL and adds th=
e ability to indicate the column and value type. Something like this</div><=
/div></div></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);b=
order-left-style:solid;padding-left:1ex">

<div dir=3D"ltr"><div><div><br></div>insert into myColumnFamily(staticColum=
n1, staticColumn2, 20 as int, dynamicColumn as string) into (&#39;text1&#39=
;,&#39;text2&#39;,30.55 as double, 3500 as long)<br>
<br></div>This is one area where Thrift is superior to CQL. Having said tha=
t, it&#39;s valid to use Cassandra &quot;as if&quot; it was a relational da=
tabase, but then you&#39;d miss out on some of the unique features.<br>

</div></blockquote><div><br></div><div>Man, if I had a nickel every time so=
meone came on that mailing list pretending that something was possible with=
 thrift and not CQL ... I will claim this: with=A0CASSANDRA-6561 and CASSAN=
DRA-4851=A0that just got in, there is *nothing* that thrift can do that CQL=
 cannot. But well, what do I know about Cassandra.</div>

<div><br></div><div>--</div><div>Sylvain=A0</div><div><br></div><div>=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:soli=
d;padding-left:1ex">

<div dir=3D"ltr">
<br><br></div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gm=
ail_quote">On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne <span dir=3D"=
ltr">&lt;<a href=3D"mailto:sylvain@datastax.com" target=3D"_blank">sylvain@=
datastax.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div><div clas=
s=3D"gmail_quote">

On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:edlinuxguru@gmail.com" target=3D"_blank">edlinuxguru@gmail.co=
m</a>&gt;</span> wrote:<br>

</div></div><div class=3D"gmail_quote"><div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-colo=
r:rgb(204,204,204);border-left-style:solid;padding-left:1ex">For what it is=
 worth you schema is simple and uses compact storage. Thus you really dont =
need anything in cassandra 2.0 as far as i can tell. You might be happier w=
ith a stable release like 1.2.something and just hector or astyanax. You ar=
e really dealing with many issues you should not have to just to protoype a=
 simple cassandra app.</blockquote>


<div><br></div><div><br></div></div><div>Of course, if everyone was using t=
hat reasoning, no-one would ever test new features and report problems/sugg=
est improvement. So thanks to anyone like R=FCdiger that actually tries stu=
ff and take the time to report problems when they think they encounter one.=
 Keep at it, *you* are the one helping Cassandra to get better everyday.=A0=
</div>


<div><br></div><div>And you are also right=A0R=FCdiger that it&#39;s probab=
ly not worth the trouble to start with thrift if you&#39;re gonna use CQL l=
ater. And you definitively should use CQL, it is Cassandra&#39;s future.</d=
iv>


<div><br></div><div>--</div><div>Sylvain=A0</div><div><div><div><br></div><=
div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex">


<div><div><br>
<br>On Thursday, February 20, 2014, Sylvain Lebresne &lt;<a href=3D"mailto:=
sylvain@datastax.com" target=3D"_blank">sylvain@datastax.com</a>&gt; wrote:=
<br>&gt;<br>&gt;<br>&gt;<br>&gt; On Wed, Feb 19, 2014 at 9:38 PM, R=FCdiger=
 Klaehn &lt;<a href=3D"mailto:rklaehn@gmail.com" target=3D"_blank">rklaehn@=
gmail.com</a>&gt; wrote:<br>


&gt;&gt;<br>&gt;&gt; I have cloned the cassandra repo, applied the patch, a=
nd built it. But when I want to run the bechmark I get an exception. See be=
low. I tried with a non-managed dependency to cassandra-driver-core-2.0.0-r=
c3-SNAPSHOT-jar-with-dependencies.jar, which I compiled from source because=
 I read that that might help. But that did not make a difference.<br>


&gt;&gt;<br>&gt;&gt; So currently I don&#39;t know how to give the patch a =
try. Any ideas?<br>&gt;&gt;<br>&gt;&gt; cheers,<br>&gt;&gt;<br>&gt;&gt; R=
=FCdiger<br>&gt;&gt;<br>&gt;&gt; Exception in thread &quot;main&quot; java.=
lang.IllegalArgumentException: replicate_on_write is not a column defined i=
n this metadata<br>


&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.ColumnDefinitions.getAllIdx(=
ColumnDefinitions.java:273)<br>&gt;&gt; =A0=A0=A0 at com.datastax.driver.co=
re.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)<br>&gt;&gt; =
=A0=A0=A0 at com.datastax.driver.core.Row.getBool(Row.java:117)<br>


&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.TableMetadata$Options.&lt;in=
it&gt;(TableMetadata.java:474)<br>&gt;&gt; =A0=A0=A0 at com.datastax.driver=
.core.TableMetadata.build(TableMetadata.java:107)<br>&gt;&gt; =A0=A0=A0 at =
com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)<br>


&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.Metadata.rebuildSchema(Metad=
ata.java:89)<br>&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.ControlConne=
ction.refreshSchema(ControlConnection.java:259)<br>&gt;&gt; =A0=A0=A0 at co=
m.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:=
214)<br>


&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.ControlConnection.reconnectI=
nternal(ControlConnection.java:161)<br>&gt;&gt; =A0=A0=A0 at com.datastax.d=
river.core.ControlConnection.connect(ControlConnection.java:77)<br>&gt;&gt;=
 =A0=A0=A0 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:89=
0)<br>


&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.Cluster$Manager.newSession(C=
luster.java:910)<br>&gt;&gt; =A0=A0=A0 at com.datastax.driver.core.Cluster$=
Manager.access$200(Cluster.java:806)<br>&gt;&gt; =A0=A0=A0 at com.datastax.=
driver.core.Cluster.connect(Cluster.java:158)<br>


&gt;&gt; =A0=A0=A0 at cassandra.CassandraTestMinimized$delayedInit$body.app=
ly(CassandraTestMinimized.scala:31)<br>&gt;&gt; =A0=A0=A0 at scala.Function=
0$class.apply$mcV$sp(Function0.scala:40)<br>&gt;&gt; =A0=A0=A0 at scala.run=
time.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)<br>


&gt;&gt; =A0=A0=A0 at scala.App$$anonfun$main$1.apply(App.scala:71)<br>&gt;=
&gt; =A0=A0=A0 at scala.App$$anonfun$main$1.apply(App.scala:71)<br>&gt;&gt;=
 =A0=A0=A0 at scala.collection.immutable.List.foreach(List.scala:318)<br>&g=
t;&gt; =A0=A0=A0 at scala.collection.generic.TraversableForwarder$class.for=
each(TraversableForwarder.scala:32)<br>


&gt;&gt; =A0=A0=A0 at scala.App$class.main(App.scala:71)<br>&gt;&gt; =A0=A0=
=A0 at cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:=
5)<br>&gt;&gt; =A0=A0=A0 at cassandra.CassandraTestMinimized.main(Cassandra=
TestMinimized.scala)<br>


&gt;<br>&gt; I believe you&#39;ve tried the cassandra trunk branch? trunk i=
s basically the future Cassandra 2.1 and the driver is currently unhappy be=
cause the replicate_on_write option has been removed in that version. I&#39=
;m supposed to have fixed that on the driver 2.0 branch like 2 days ago so =
maybe you&#39;re also using a slightly old version of the driver sources in=
 there? Or maybe I&#39;ve screwed up my fix, I&#39;ll double check. But any=
way, it would be overall simpler to test with the cassandra-2.0 branch of C=
assandra, with which you shouldn&#39;t run into that.<br>


&gt; --<br>&gt; Sylvain<span class=3D"HOEnZb"><font color=3D"#888888"><br><=
br></font></span></div></div><span class=3D"HOEnZb"><font color=3D"#888888"=
><span><font color=3D"#888888">-- <br>Sorry this was sent from mobile. Will=
 do less grammar and spell check than usual.<br>

</font></span></font></span></blockquote></div></div></div><span class=3D"H=
OEnZb"><font color=3D"#888888"><br></font></span></div></div><span class=3D=
"HOEnZb"><font color=3D"#888888">
</font></span></blockquote></div><span class=3D"HOEnZb"><font color=3D"#888=
888"><br></font></span></div>
</div></div></blockquote></div><br></div></div>
</blockquote></div><br></div>

--089e01176b1d4f480104f2db8eb3--