cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Re: Memory overhead of vector clocks…. how often are they pruned?
Date Wed, 24 Aug 2011 17:41:31 GMT
This is really interesting… I can track it down but there are a number of
references to Cassandra HAVING vector clocks … which would make sense that I
can't find out how much memory they are using :-P

"Cassandra: The Definitive Guide" … which I was reading the other night says
that they were introduced in 0.7 but that they're still figuring out what to
do with them:

http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA50&lpg=PA50&dq=Cassandra's+clock+was+introduced+in+version+0.7,+but+its+fate+is+uncertain&source=bl&ots=XoQz3tFa1C&sig=Lhdu5j1xRcTPmP4-YQONhxzfRTU&hl=en&ei=MzdVTurWEJTSiAKU5vXoDA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBkQ6AEwAA#v=onepage&q&f=false

… so… are 'timestamps' pruned?

Even this mechanism seems like it will dominate the amount of memory used in
Cassandra.  I could see many installs requiring 2-3x more memory to run
Cassandra unless there is a pruning mechanism or some way to minimize their
use.

Kevin


On Wed, Aug 24, 2011 at 9:05 AM, Ryan King <ryan@twitter.com> wrote:

> On Tue, Aug 23, 2011 at 7:58 PM, Kevin Burton <burton@spinn3r.com> wrote:
>
>> I had a thread going the other day about vector clock memory usage and
>> that it is a series of (clock id, clock):ts and the ability to prune old
>> entries … I'm specifically curious here how often old entries are pruned.
>>
>> If you're storing small columns within cassandra.  Say just an integer.
>>  The vector clock overhead could easily use up far more data than is
>> actually in your database.
>>
>> However, if they are pruned, then this shouldn't really be a problem.
>>
>> How much memory is this wasting?
>>
>
> I think there is some confusion here– cassandra doesn't use vector clocks.
>
> -ryan
>
>
>> Thoughts?
>>
>>
>>     Jonathan Ellis jbellis@gmail.com to user
>>  show details Aug 19 (4 days ago)
>>  The problem with naive last write wins is that writes don't always
>> arrive at each replica in the same order.  So no, that's a
>> non-starter.
>>
>> Vector clocks are a series of (client id, clock) entries, and usually
>> a timestamp so you can prune old entries.  Obviously implementations
>> can vary, but to pick a specific example, Voldemort [1] uses 2 bytes
>> per client id, a variable number (at least one) of bytes for the
>> clock, and 8 bytes for the timestamp.
>>
>> [1]
>> https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/versioning/VectorClock.java
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>>
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>>
>> Skype-in: *(415) 871-0687*
>>
>>
>


-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Mime
View raw message