incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rüdiger Klaehn <rkla...@gmail.com>
Subject Re: Performance problem with large wide row inserts using CQL
Date Thu, 20 Feb 2014 21:49:45 GMT
Hi Sylvain,

I applied the patch to the cassandra-2.0 branch (this required some manual
work since I could not figure out which commit it was supposed to apply
for, and it did not apply to the head of cassandra-2.0).

The benchmark now runs in pretty much identical time to the thrift based
benchmark. ~30s for 1000 inserts of 10000 key/value pairs each. Great work!


I still have some questions regarding the mapping. Please bear with me if
these are stupid questions. I am quite new to Cassandra.

The basic cassandra data model for a keyspace is something like this, right?

SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>
                 ^ row key. determines which server(s) the rest is stored on
                                             ^ column key
                                                               ^ timestamp
(latest one wins)
                                                                        ^
value (can be size 0)

So if I have a table like the one in my benchmark (using blobs)

CREATE TABLE IF NOT EXISTS test.wide (
  time blob,
  name blob,
  value blob,
  PRIMARY KEY (time,name))
  WITH COMPACT STORAGE

>From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems that

- time maps to the row key and name maps to the column key without any
overhead
- value directly maps to value in the model above without any prefix

is that correct, or is there some overhead involved in CQL over the raw
model as described above? If so, where exactly?

kind regards and many thanks for your help,

Rüdiger


On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne <sylvain@datastax.com>wrote:

>
>
>
> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn <rklaehn@gmail.com> wrote:
>
>>
>> I have cloned the cassandra repo, applied the patch, and built it. But
>> when I want to run the bechmark I get an exception. See below. I tried with
>> a non-managed dependency to
>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>> compiled from source because I read that that might help. But that did not
>> make a difference.
>>
>> So currently I don't know how to give the patch a try. Any ideas?
>>
>> cheers,
>>
>> Rüdiger
>>
>> Exception in thread "main" java.lang.IllegalArgumentException:
>> replicate_on_write is not a column defined in this metadata
>>     at
>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>     at
>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>     at com.datastax.driver.core.Row.getBool(Row.java:117)
>>     at
>> com.datastax.driver.core.TableMetadata$Options.<init>(TableMetadata.java:474)
>>     at
>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>     at
>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>     at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>     at
>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>     at
>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>     at
>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>     at
>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>     at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>     at
>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>     at
>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>     at
>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>     at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>     at
>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>     at
>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>>     at scala.App$class.main(App.scala:71)
>>     at
>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>>     at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>
>
> I believe you've tried the cassandra trunk branch? trunk is basically the
> future Cassandra 2.1 and the driver is currently unhappy because the
> replicate_on_write option has been removed in that version. I'm supposed to
> have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
> also using a slightly old version of the driver sources in there? Or maybe
> I've screwed up my fix, I'll double check. But anyway, it would be overall
> simpler to test with the cassandra-2.0 branch of Cassandra, with which you
> shouldn't run into that.
>
> --
> Sylvain
>

Mime
View raw message