cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Memory overhead of vector clocks…. how often are they pruned?
Date Wed, 24 Aug 2011 17:54:07 GMT
At the point that book was written (about a year ago it was finalized), vector clocks were
planned.  In August or September of last year, they were removed.  0.7 was released in January.
 The ticket for vector clocks is here and you can see the reasoning for not using them at
the bottom.  https://issues.apache.org/jira/browse/CASSANDRA-580

On Aug 24, 2011, at 12:41 PM, Kevin Burton wrote:

> This is really interesting… I can track it down but there are a number of references
to Cassandra HAVING vector clocks … which would make sense that I can't find out how much
memory they are using :-P
> 
> "Cassandra: The Definitive Guide" … which I was reading the other night says that they
were introduced in 0.7 but that they're still figuring out what to do with them:
> 
> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA50&lpg=PA50&dq=Cassandra's+clock+was+introduced+in+version+0.7,+but+its+fate+is+uncertain&source=bl&ots=XoQz3tFa1C&sig=Lhdu5j1xRcTPmP4-YQONhxzfRTU&hl=en&ei=MzdVTurWEJTSiAKU5vXoDA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBkQ6AEwAA#v=onepage&q&f=false
> 
> … so… are 'timestamps' pruned?  
> 
> Even this mechanism seems like it will dominate the amount of memory used in Cassandra.
 I could see many installs requiring 2-3x more memory to run Cassandra unless there is a pruning
mechanism or some way to minimize their use.
> 
> Kevin
> 
> 
> On Wed, Aug 24, 2011 at 9:05 AM, Ryan King <ryan@twitter.com> wrote:
> On Tue, Aug 23, 2011 at 7:58 PM, Kevin Burton <burton@spinn3r.com> wrote:
> I had a thread going the other day about vector clock memory usage and that it is a series
of (clock id, clock):ts and the ability to prune old entries … I'm specifically curious
here how often old entries are pruned.
> 
> If you're storing small columns within cassandra.  Say just an integer.  The vector clock
overhead could easily use up far more data than is actually in your database.
> 
> However, if they are pruned, then this shouldn't really be a problem.  
> 
> How much memory is this wasting?
> 
> I think there is some confusion here– cassandra doesn't use vector clocks.
> 
> -ryan
>  
> Thoughts?
> 
> 
> Jonathan Ellis jbellis@gmail.com to user
> show details Aug 19 (4 days ago)
> The problem with naive last write wins is that writes don't always
> arrive at each replica in the same order.  So no, that's a
> non-starter.
> 
> Vector clocks are a series of (client id, clock) entries, and usually
> a timestamp so you can prune old entries.  Obviously implementations
> can vary, but to pick a specific example, Voldemort [1] uses 2 bytes
> per client id, a variable number (at least one) of bytes for the
> clock, and 8 bytes for the timestamp.
> 
> [1] https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/versioning/VectorClock.java
> 
> 
> -- 
> Founder/CEO Spinn3r.com
> 
> Location: San Francisco, CA
> Skype: burtonator
> Skype-in: (415) 871-0687
> 
> 
> 
> 
> 
> -- 
> Founder/CEO Spinn3r.com
> 
> Location: San Francisco, CA
> Skype: burtonator
> Skype-in: (415) 871-0687
> 


Mime
View raw message