cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: Memory overhead of vector clocks…. how often are they pruned?
Date Wed, 24 Aug 2011 22:11:50 GMT
We did have a Clock construct for awhile, but it never made it into a
released version (afaik). We though about using them for counters.

Timestamps are endemic to the data model and therefore can never be
pruned. Cassandra basically trades memory for availability here.

-ryan

On Wed, Aug 24, 2011 at 10:54 AM, Jeremy Hanna
<jeremy.hanna1234@gmail.com> wrote:
> At the point that book was written (about a year ago it was finalized), vector clocks
were planned.  In August or September of last year, they were removed.  0.7 was released
in January.  The ticket for vector clocks is here and you can see the reasoning for not using
them at the bottom.  https://issues.apache.org/jira/browse/CASSANDRA-580
>
> On Aug 24, 2011, at 12:41 PM, Kevin Burton wrote:
>
>> This is really interesting… I can track it down but there are a number of references
to Cassandra HAVING vector clocks … which would make sense that I can't find out how much
memory they are using :-P
>>
>> "Cassandra: The Definitive Guide" … which I was reading the other night says that
they were introduced in 0.7 but that they're still figuring out what to do with them:
>>
>> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA50&lpg=PA50&dq=Cassandra's+clock+was+introduced+in+version+0.7,+but+its+fate+is+uncertain&source=bl&ots=XoQz3tFa1C&sig=Lhdu5j1xRcTPmP4-YQONhxzfRTU&hl=en&ei=MzdVTurWEJTSiAKU5vXoDA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBkQ6AEwAA#v=onepage&q&f=false
>>
>> … so… are 'timestamps' pruned?
>>
>> Even this mechanism seems like it will dominate the amount of memory used in Cassandra.
 I could see many installs requiring 2-3x more memory to run Cassandra unless there is a
pruning mechanism or some way to minimize their use.
>>
>> Kevin
>>
>>
>> On Wed, Aug 24, 2011 at 9:05 AM, Ryan King <ryan@twitter.com> wrote:
>> On Tue, Aug 23, 2011 at 7:58 PM, Kevin Burton <burton@spinn3r.com> wrote:
>> I had a thread going the other day about vector clock memory usage and that it is
a series of (clock id, clock):ts and the ability to prune old entries … I'm specifically
curious here how often old entries are pruned.
>>
>> If you're storing small columns within cassandra.  Say just an integer.  The vector
clock overhead could easily use up far more data than is actually in your database.
>>
>> However, if they are pruned, then this shouldn't really be a problem.
>>
>> How much memory is this wasting?
>>
>> I think there is some confusion here– cassandra doesn't use vector clocks.
>>
>> -ryan
>>
>> Thoughts?
>>
>>
>> Jonathan Ellis jbellis@gmail.com to user
>> show details Aug 19 (4 days ago)
>> The problem with naive last write wins is that writes don't always
>> arrive at each replica in the same order.  So no, that's a
>> non-starter.
>>
>> Vector clocks are a series of (client id, clock) entries, and usually
>> a timestamp so you can prune old entries.  Obviously implementations
>> can vary, but to pick a specific example, Voldemort [1] uses 2 bytes
>> per client id, a variable number (at least one) of bytes for the
>> clock, and 8 bytes for the timestamp.
>>
>> [1] https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/versioning/VectorClock.java
>>
>>
>> --
>> Founder/CEO Spinn3r.com
>>
>> Location: San Francisco, CA
>> Skype: burtonator
>> Skype-in: (415) 871-0687
>>
>>
>>
>>
>>
>> --
>> Founder/CEO Spinn3r.com
>>
>> Location: San Francisco, CA
>> Skype: burtonator
>> Skype-in: (415) 871-0687
>>
>
>

Mime
View raw message