incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Severe Reliability Problems - 0.7 RC2
Date Mon, 20 Dec 2010 21:05:58 GMT
> There were a couple of threads on lkml recently that may be relevant,
> but I have to run so I can't find the URL:s atm (todo later tonight).

Ok, I cannot figure out how to find the "first" message in a thread in
any of the lkml archives, but these two threads may be of interest,
especially if you can find their beginnings:

   http://lkml.indiana.edu/hypermail/linux/kernel/1011.3/00030.html

And to a lesser extent (I started that before knowing about the above one):

   http://lkml.indiana.edu/hypermail/linux/kernel/1011.3/00252.html

They don't really talk about the same symptoms, but there are some
good tips on monitoring what's going on there and some of the things
(numactl interleaving, avoiding higher order allocations) might
conceivably be useful in this case too. At least on the theory that
some kind of eviction or looking-for-free-space loop is what's
spinning (and yes, this is an assumption based on very little
evidence...).

Also, you're virtualized (given %steal), right? I wonder to what
extent that impacts the vm subsystem in the guest kernel (I don't
really know to what extent there is guest<->host co-op nowadays on ec2
etc).

-- 
/ Peter Schuller

Mime
View raw message