cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rana Aich <aichr...@gmail.com>
Subject Re: UnavailableException when data grows
Date Mon, 27 Sep 2010 21:55:20 GMT
Hi Peter,

Thanks for your detailed query...

I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the name suggests
KVSLOWs have low diskspace ~ 350GB
 Whereas KVSHIGHs have 1.5 terabytes.

Yet my nodetool shows the following:
192.168.202.202Down       319.94 GB
7200044730783885730400843868815072654      |<--|
192.168.202.4 Up         382.39 GB
23719654286404067863958492664769598669     |   ^
192.168.202.2 Up         106.81 GB
36701505058375526444137310055285336988     v   |
192.168.202.3 Up         149.81 GB
65098486053779167479528707238121707074     |   ^
192.168.202.201Up         154.72 GB
79420606800360567885560534277526521273     v   |
192.168.202.204Up         72.91 GB
 85219217446418416293334453572116009608     |   ^
192.168.202.1 Up         29.78 GB
 87632302962564279114105239858760976120     v   |
192.168.202.203Up         9.35 GB
87790520647700936489181912967436646309     |-->|

As you can see one of our KVSLOW box is already down. Its 100% full. Whereas
boxes having 1.5 terabytes have only 29.78 GB (192.168.202.1 )! I'm using
RandomPartitioner. When I run the client program the Cassandra Daemon takes
around 85-130% CPU.

Regards,

Rana



On Mon, Sep 27, 2010 at 2:31 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > How can I handle this kind of situation?
>
> In terms of surviving the problem, a re-try on the client side might
> help assuming the problem is temporary.
>
> However,  certainly the fact that you're seeing an issue to begin with
> is interesting, and the way to avoid it would depend on what the
> problem is. My understanding is that the UnavailableException
> indicates that the node you are talking to was unable to read
> form/write to a sufficient number of nodes to satisfy your consistency
> level. Presumably either because individual requests failed to return
> in time, or because the node considers other nodes to be flat out
> down.
>
> Can you correlate these issues with server-side activity on the nodes,
> such as background compaction, commitlog rotation or memtable
> flushing? Do you see your nodes saying that other nodes in the cluster
> are "DOWN" and "UP" (flapping)?
>
> How large is the data set in total (in terms of sstable size on disk),
> and how much memory do you have in your machines (going to page
> cache)?
>
> Have you observed the behavior of your nodes during compaction; in
> particular whether compaction is CPU bound or I/O bound? (That would
> tend to depend on data; generally the larger the individual values the
> more disk bound you'd tend to be.)
>
> Just trying to zero in on what the likely root cause is in this case.
>
> --
> / Peter Schuller
>

Mime
View raw message