incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: UnavailableException with 1 node down and RF=2?
Date Fri, 28 Oct 2011 09:52:03 GMT
>  Thank you for your explanations. Even with a RF=1 and one node down I don't
> understand why I can't at least read the data in the nodes that are still
> up?

You will be able to read data for row keys that do not live on the
node that is down. But for any request to a row which is on the node
that is down, Unavailable is the expected result. If the data simply
does not exist other than on the one single node, and that node is
down, there's nothing Cassandra, or any other system, can do ;)

> Also, why can't I at least perform writes with consistency level ANY and
> failover policy ON_FAIL_TRY_ALL_AVAILABLE...shouldn't the nodes that are up
> be able to take in the writes destined for the node that is down and perform
> hinted handoffs when it comes back again?

You seem to be mixing Hector stuff and Cassandra concepts here. So to
be clear: You can use CL.ANY in order to make writes be accepted even
if the one and only node that owns the data in question is down.
However, that data won't be *readable* until that node (1) comes back
up, and (2) hints are delivered to it. This is all in Cassandra.

The failover policy stuff applies to Hector and how it chooses to
select nodes, and should be orthogonal to whether or not data is
readable as such. Basically, don't try to use that to get around lack
of data due to nodes being down.

(Also, note that while I don't know/remember off hand, I don't think
Unavailable is going to be tried on all available as that indicates
the node responded correctly and that nodes are in fact actually down.
I would expect the policy to apply to cases where communication with
the co-ordinator node fails. But, I am speculating here and this might
be wrong.)

> Unless by construction Cassandra
> behaves in the way you describe (which is perfectly fine and I will use it
> that way from now on) it would be logical for the RF=1 to not affect the
> behaviour I expect from just reading the top level descriptions of Cassandra
> behaviour I found in the documentation.

If you mean that rows that are NOT on the node that is down should be
readable, then that is indeed the case. If you are unable to read data
from other rows, that is definitely unexpected.

In *that* case, the failover policy that you mention might be at play.
I.e., you want the hector client not to fail a request just because a
single node happens to be down. But since you're getting an
"unavailable" exception, that indicates that Hector was able to talk
to the selected Cassandra node, and that the node in question gave an
Unavailable exception back indicating that the read or write could not
be serviced at the given consistency level due to nodes being down.

I would start by double checking exactly which row key(s) are being
written to/read from, and whether they are truly not on the node(s)
that are down.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Mime
View raw message