incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TSANG Yiu Wing <ywts...@gmail.com>
Subject Re: seed node failure crash the whole cluster
Date Tue, 08 Feb 2011 03:07:07 GMT
i will continue the issue here:

http://groups.google.com/group/scale7/browse_thread/thread/dd74f1d6265ae2e7

thanks


On Tue, Feb 8, 2011 at 7:44 AM, Dan Washusen <dan@reactive.org> wrote:
> Hi,
> I've added some comments and questions inline.
>
> Cheers,
> Dan
> On 8 February 2011 10:00, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing <ywtsang@gmail.com> wrote:
>> > cassandra version: 0.7
>> >
>> > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT
>> >
>> > cluster: 3 machines (A, B, C)
>> >
>> > details:
>> > it works perfectly when all 3 machines are up and running
>> >
>> > but if the seed machine is down, the problems happen:
>> >
>> > 1) new client connection cannot be established
>>
>> sounds like pelops relies on the seed node to introduce it to the
>> cluster.  you should configure it either with a hardcoded list of
>> nodes or use something like RRDNS instead.  I don't use pelops so I
>> can't help other than that.  (I believe there is a mailing list for
>> Pelops though.)
>
> When dynamic node discovery is turned on (off by default) it doesn't
> (shouldn't) rely on the initial seed node once past initialization.  So
> either make sure you have dynamic node discovery turned on or seed Pelops
> with all nodes in your cluster...
> It would be helpful if you provided more information about the errors you're
> seeing preferably with debug level logging turned on.
>
>>
>> > 2) if a client keeps connecting to and operating at (issue get and
>> > update) the cluster, when the seed is down, the working client will
>> > throw exception upon the next operation
>>
>> I know Hector supports transparent failover to another Cassandra node.
>>  Perhaps Pelops does not.
>
> Pelops will validate connections at a configurable period (60 seconds by
> default) and remove them from the pool.  Pelops will also retry the
> operation three times (configurable) against a different node in the pool
> each time.
> If you want Pelops to take more agressive actions when it detects downed
> nodes then check out
> org.scale7.cassandra.pelops.pool.CommonsBackedPool.INodeSuspensionStrategy.
>
>>
>> > 3) using cassandra-cli to connect the remaining nodes in the cluster,
>> > "Internal error processing get_range_slices" will happen when querying
>> > column family
>> >> list <cf>;
>>
>> Cassandra always logs the cause of internal errors in system.log, so
>> you should look there.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>

Mime
View raw message