cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parag Shah <ps...@proofpoint.com>
Subject Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?
Date Mon, 24 Nov 2014 23:01:59 GMT
In our case, the timeouts were happening because internode authentication was turned on and
by default the user column family in the system_auth keyspace is replicated only on 1 node.
We also had to tune the permissions_validity_in_ms from the default of 2000 ms to a larger
value. The issue was that all authentication requests would go to one node, since it was replicated
only on 1 node. We set replication factor to n (# of nodes) on the system_auth keyspace.

Hope this helps.

Parag

From: Robert Coli <rcoli@eventbrite.com<mailto:rcoli@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, November 24, 2014 at 2:52 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

On Mon, Nov 24, 2014 at 12:57 PM, Kevin Burton <burton@spinn3r.com<mailto:burton@spinn3r.com>>
wrote:
I’m trying to track down some exceptions in our production cluster.  I bumped up our write
load and now I’m getting a non-trivial number of these exceptions.  Somewhere on the order
of 100 per hour.

All machines have a somewhat high CPU load because they’re doing other tasks.  I’m worried
that perhaps my background tasks are just overloading cassandra and one way to mitigate this
is to nice them to least favorable priority (this is my first tasks).

Two out of three of them are timeouts or lack of availability. Seeing this across your cluster
is usually associated with hitting a "pre-fail" condition in terms of GC, where the amount
of data stored per node makes the steady state working set larger than available non-fragmented
heap. If you're graphing GC time, I would expect to see a concomitant spike there.

=Rob


Mime
View raw message