incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: seed node failure crash the whole cluster
Date Mon, 07 Feb 2011 23:44:20 GMT
Hi,
I've added some comments and questions inline.

Cheers,
Dan

On 8 February 2011 10:00, Jonathan Ellis <jbellis@gmail.com> wrote:

> On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing <ywtsang@gmail.com> wrote:
> > cassandra version: 0.7
> >
> > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT
> >
> > cluster: 3 machines (A, B, C)
> >
> > details:
> > it works perfectly when all 3 machines are up and running
> >
> > but if the seed machine is down, the problems happen:
> >
> > 1) new client connection cannot be established
>
> sounds like pelops relies on the seed node to introduce it to the
> cluster.  you should configure it either with a hardcoded list of
> nodes or use something like RRDNS instead.  I don't use pelops so I
> can't help other than that.  (I believe there is a mailing list for
> Pelops though.)
>

When dynamic node discovery is turned on (off by default) it doesn't
(shouldn't) rely on the initial seed node once past initialization.  So
either make sure you have dynamic node discovery turned on or seed Pelops
with all nodes in your cluster...

It would be helpful if you provided more information about the errors you're
seeing preferably with debug level logging turned on.


>
> > 2) if a client keeps connecting to and operating at (issue get and
> > update) the cluster, when the seed is down, the working client will
> > throw exception upon the next operation
>
> I know Hector supports transparent failover to another Cassandra node.
>  Perhaps Pelops does not.
>

Pelops will validate connections at a configurable period (60 seconds by
default) and remove them from the pool.  Pelops will also retry the
operation three times (configurable) against a different node in the pool
each time.

If you want Pelops to take more agressive actions when it detects downed
nodes then check out
org.scale7.cassandra.pelops.pool.CommonsBackedPool.INodeSuspensionStrategy.


>
> > 3) using cassandra-cli to connect the remaining nodes in the cluster,
> > "Internal error processing get_range_slices" will happen when querying
> > column family
> >> list <cf>;
>
> Cassandra always logs the cause of internal errors in system.log, so
> you should look there.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message