cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artie Copeland <>
Subject Re: row cache during bootstrap
Date Mon, 09 Aug 2010 22:01:13 GMT
On Sun, Aug 8, 2010 at 5:24 AM, aaron morton <>wrote:

> Not sure how feasible it is or if it's planned. But it would probably
> require that the nodes are able so share the state of their row cache so as
> to know which parts to warm. Otherwise it sounds like you're assuming the
> node can hold the entire data set in memory.
> Im not assuming the node can hold the entire data set in cassandra in
memory, if thats what you meant. I was thinking of sharing the state of the
row cache, but only those keys that are being moved for the token.  the
other keys can stay hidden to the node.

> If you know in your application when you would like data to be in the
> cache, you can send a query like get_range_slices to the cluster and ask for
> 0 columns. That will warm the row cache for the keys it hits.

This is a tuff one as our row cache is over 20 million and takes a while to
get a large hit ratio. so while we try to preload it is taking requests.  If
it were possible to bring up a node that doesnt announce its availability to
the cluster that would help us manually warm the cache.  I know this feature
is in the issue tracker currently, but didnt look like it would come out
anytime before 0.8.

> I have heard it mentioned that the coordinator node will take action to
> when one node is considered to be running slow. So it may be able to work
> around the new node until it gets warmed up.

That is interesting i haven't heard that one.  I think with the parallel
reads that are happening it makes sense that it would be possible.  That is
unless the data is local.  I believe in that case it always prefers to read
local vs over the network, so if the local machine is the slow node that
wouldnt help.

> Are you adding nodes often?
Currently not that often.  The main issue is we have very stringent latency
requirements and anything that would affect those we have to understand the
worst case cost to see if we can avoid them.

> Aaron
> On 7 Aug 2010, at 11:17, Artie Copeland wrote:
> the way i understand how row caches work is that each node has an
> independent cache, in that they do not push there cache contents with other
> nodes.  if that the case is it also true that when a new node is added to
> the cluster it has to build up its own cache.  if thats the case i see that
> as a possible performance bottle neck once the node starts to accept
> requests.  since there is no way i know of to warm the cache without adding
> the node to the cluster.  would it be infeasible to have part of the
> bootstrap process not only stream data from nodes but also cached rows that
> are associated with those same keys?  that would allow the new nodes to be
> able to provide the best performance once the bootstrap process finishes.
> --


View raw message