cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Re: Memory overhead of vector clocks…. how often are they pruned?
Date Wed, 24 Aug 2011 17:41:31 GMT
This is really interesting… I can track it down but there are a number of
references to Cassandra HAVING vector clocks … which would make sense that I
can't find out how much memory they are using :-P

"Cassandra: The Definitive Guide" … which I was reading the other night says
that they were introduced in 0.7 but that they're still figuring out what to
do with them:'s+clock+was+introduced+in+version+0.7,+but+its+fate+is+uncertain&source=bl&ots=XoQz3tFa1C&sig=Lhdu5j1xRcTPmP4-YQONhxzfRTU&hl=en&ei=MzdVTurWEJTSiAKU5vXoDA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBkQ6AEwAA#v=onepage&q&f=false

… so… are 'timestamps' pruned?

Even this mechanism seems like it will dominate the amount of memory used in
Cassandra.  I could see many installs requiring 2-3x more memory to run
Cassandra unless there is a pruning mechanism or some way to minimize their


On Wed, Aug 24, 2011 at 9:05 AM, Ryan King <> wrote:

> On Tue, Aug 23, 2011 at 7:58 PM, Kevin Burton <> wrote:
>> I had a thread going the other day about vector clock memory usage and
>> that it is a series of (clock id, clock):ts and the ability to prune old
>> entries … I'm specifically curious here how often old entries are pruned.
>> If you're storing small columns within cassandra.  Say just an integer.
>>  The vector clock overhead could easily use up far more data than is
>> actually in your database.
>> However, if they are pruned, then this shouldn't really be a problem.
>> How much memory is this wasting?
> I think there is some confusion here– cassandra doesn't use vector clocks.
> -ryan
>> Thoughts?
>>     Jonathan Ellis to user
>>  show details Aug 19 (4 days ago)
>>  The problem with naive last write wins is that writes don't always
>> arrive at each replica in the same order.  So no, that's a
>> non-starter.
>> Vector clocks are a series of (client id, clock) entries, and usually
>> a timestamp so you can prune old entries.  Obviously implementations
>> can vary, but to pick a specific example, Voldemort [1] uses 2 bytes
>> per client id, a variable number (at least one) of bytes for the
>> clock, and 8 bytes for the timestamp.
>> [1]
>> --
>> Founder/CEO
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> Skype-in: *(415) 871-0687*



Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

View raw message