cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan" <>
Subject RE: custom reconciling columns?
Date Thu, 30 Jun 2011 15:27:36 GMT
The reason to break it up is that the information will then be on
different servers, so you can have server 1 spending time retrieving row
1, while you have server 2 retrieving row 2, and server 3 retrieving row
3...  So instead of getting 3000 things from one server, you get 1000
from 3 servers in parallel...


From: Yang [] 
Sent: Wednesday, June 29, 2011 12:07 AM
Subject: Re: custom reconciling columns?

ok, here is the profiling result. I think this is consistent (having
been trying to recover how to effectively use yourkit ...)  see attached

since I actually do not use the thrift interface, but just directly use
the thrift.CassandraServer and run my code in the same JVM as cassandra,

and was running the whole thing on a single box, there is no message
serialization/deserialization cost. but more columns did add on to more

the time was spent in the ConcurrentSkipListMap operations that
implement the memtable. 

regarding breaking up the row, I'm not sure it would reduce my run time,
since our requirement is to read the entire rolling window history (we
already have 
the TTL enabled , so the history is limited to a certain length, but it
is quite long: over 1000 , in some  cases, can be 5000 or more ) .  I
think accessing roughly 1000 items is not an uncommon requirement for
many applications. in our case, each column has about 30 bytes of data,
besides the meta data such as ttl, timestamp.  
at history length of 3000, the read takes about 12ms (remember this is
completely in-memory, no disk access) 

I just took a look at the expiring column logic, it looks that the
expiration does not come into play until when the
CassandraServer.internal_get()===>thriftifyColumns() gets called. so the
above memtable access time is still spent. yes, then breaking up the row
is going to be helpful, but only to the degree of preventing accessing 
expired columns (btw ---- if this is actually built into cassandra code
it would be nicer, so instead of spending multiple key lookups, I locate
to the row once, and then within the row, there are different
"generation" buckets, so those old generation buckets that are beyond
expiration are not read ); currently just accessing the 3000 live
columns is already quite slow.

I'm trying to see whether there are some easy magic bullets for a
drop-in replacement for concurrentSkipListMap...


On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall <> wrote:

	I agree with Aaron's suggestion on data model and query here.
	there is a time component, you can split the row on a fixed
	for a given user, so the row key would become userId_[timestamp
	rounded to day].
	This provides you an easy way to roll up the information for the
	ranges you need since the key suffix can be created without a
	This also benefits from spreading the read load over the cluster
	instead of just the replicas since you have 30 rows in this case
	instead of one.

	On Tue, Jun 28, 2011 at 5:55 PM, aaron morton
<> wrote:
	> Can you provide some more info:
	> - how big are the rows, e.g. number of columns and column size
	> - how much data are you asking for ?
	> - what sort of read query are you using ?
	> - what sort of numbers are you seeing ?
	> - are you deleting columns or using TTL ?
	> I would consider issues with the data churn, data model and
query before
	> looking at serialisation.
	> Cheers
	> -----------------
	> Aaron Morton
	> Freelance Cassandra Developer
	> @aaronmorton
	> On 29 Jun 2011, at 10:37, Yang wrote:
	> I can see that as my user history grows, the reads time
proportionally ( or
	> faster than linear) grows.
	> if my business requirements ask me to keep a month's history
for each user,
	> it could become too slow.----- I was suspecting that it's
actually the
	> serializing and deserializing that's taking time (I can
definitely it's cpu
	> bound)
	> On Tue, Jun 28, 2011 at 3:04 PM, aaron morton
	> wrote:
	>> There is no facility to do custom reconciliation for a
column. An append
	>> style operation would run into many of the same problems as
the Counter
	>> type, e.g. not every node may get an append and there is a
chance for lost
	>> appends unless you go to all the trouble Counter's do.
	>> I would go with using a row for the user and columns for each
item. Then
	>> you can have fast no look writes.
	>> What problems are you seeing with the reads ?
	>> Cheers
	>> -----------------
	>> Aaron Morton
	>> Freelance Cassandra Developer
	>> @aaronmorton
	>> On 29 Jun 2011, at 04:20, Yang wrote:
	>> > for example, if I have an application that needs to read
off a user
	>> > browsing history, and I model the user ID as the key,
	>> > and the history data within the row. with current approach,
I could
	>> > model each visit as  a column,
	>> > the possible issue is that *possibly* (I'm still doing a
lot of
	>> > profiling on this to verify) that a lot of time is spent on
	>> > into the message and out of the
	>> > message, plus I do not need the full features provided by
the column :
	>> > for example I do not need a timestamp on each visit, etc,
	>> > so it might be faster to put the entire history in a blob,
and each
	>> > visit only takes up a few bytes in the blob, and
	>> > my code manipulates the blob.
	>> >
	>> > problem is, I still need to avoid the read-before-write, so
I send only
	>> > the latest visit, and let cassandra do the reconcile, which
appends the
	>> > visit to the blob, so this needs custom reconcile behavior.
	>> >
	>> > is there a way to incorporate such custom reconcile under
current code
	>> > framework? (I see custom sorting, but no custom reconcile)
	>> >
	>> > thanks
	>> > yang

View raw message