incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Schildmeijer <>
Subject Re: Handling failures in Thrift
Date Thu, 13 May 2010 18:20:28 GMT
All the Exceptions are documented on the API page ( on
the wiki.

* UnavailableException -- "Not all the replicas required could be created and/or read."
* TimedOutException -- "The node responsible for the write or read did not respond during
the rpc interval specified in your configuration (default 10s). This can happen if the request
is too large, the node is oversaturated with requests, or the node is down but the failure
detector has not yet realized it (usually this takes < 30s)."

Its hard to give a generic solution proposal. The "proper course of action" depends on your
application domain.
As stated on the wiki the reason for timeout exception could be because of different reasons.
  * "request is too large" -- Proposal: try to narrow your request

  * "node is oversaturated with requests" -- Proposal: using order preserving partitioner?
try random partitioner for better load balancing.  
      need more nodes in your cluster?

  * "node is down but the failure detector has not yet realized it": altered the phi constant
in o.a.c.gsm.FailureDetector (phiConvictThreshold_, 
      default == 8)?

// Roger Schildmeijer

On 13 maj 2010, at 19.53em, Ian Soboroff wrote:

> I searched the Wiki and the mailing list archives a bit but couldn't find the answer.
> If I catch an exception from a Cassandra.Client method, in my case batch_mutate, what's
the proper course of action?
> Ignoring InvalidRequestException, we  have Unavailable, TimedOut, and generic Thrift
> Do I just gin up a new client?  Do I need to build the TTransport/Tproto bits as well?
> Thanks,
> Ian

View raw message