Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ywtsang@gmail.com designates
 209.85.214.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=E2B+KrW+WP6njH3XfVqRWXZpfTb8ow4HH7lzonPmTiSZmO+Y08GAiW36f6YPMpk9Yk
         QLYnyOFqY9Eok4KgdtQ1/WmERml3g9IsilRpc+oe98KiAF/X17mhHnPphxHxWccZ3z6Y
         lUPHtlSKyeT6yPtzOWJI/eK6p2gS9ytm/p40k=
MIME-Version: 1.0
In-Reply-To: <AANLkTikcvd0f0V7sLxtqimkkFe9UbhjOgs+WCoLFkR=2@mail.gmail.com>
References: <AANLkTinwByqdV_7Wwr5Ls9uKSncNqJoA8UeuKN0fCHLK@mail.gmail.com>
	<AANLkTin0GaQnjBSuXod3byXDTnbK-bPNJmte74oYLTjx@mail.gmail.com>
	<AANLkTikcvd0f0V7sLxtqimkkFe9UbhjOgs+WCoLFkR=2@mail.gmail.com>
Date: Tue, 8 Feb 2011 11:07:07 +0800
Message-ID: <AANLkTimc=s5iWXL20Xe5Jw3+ttKZbr08CBPi1kY1cgOT@mail.gmail.com>
Subject: Re: seed node failure crash the whole cluster
From: TSANG Yiu Wing <ywtsang@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

i will continue the issue here:

http://groups.google.com/group/scale7/browse_thread/thread/dd74f1d6265ae2e7

thanks


On Tue, Feb 8, 2011 at 7:44 AM, Dan Washusen <dan@reactive.org> wrote:
> Hi,
> I've added some comments and questions inline.
>
> Cheers,
> Dan
> On 8 February 2011 10:00, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing <ywtsang@gmail.com> wrote=
:
>> > cassandra version: 0.7
>> >
>> > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT
>> >
>> > cluster: 3 machines (A, B, C)
>> >
>> > details:
>> > it works perfectly when all 3 machines are up and running
>> >
>> > but if the seed machine is down, the problems happen:
>> >
>> > 1) new client connection cannot be established
>>
>> sounds like pelops relies on the seed node to introduce it to the
>> cluster. =A0you should configure it either with a hardcoded list of
>> nodes or use something like RRDNS instead. =A0I don't use pelops so I
>> can't help other than that. =A0(I believe there is a mailing list for
>> Pelops though.)
>
> When dynamic node discovery is turned on (off by default) it doesn't
> (shouldn't)=A0rely=A0on the initial seed node once past=A0initialization.=
 =A0So
> either make sure you have=A0dynamic node discovery turned on or seed Pelo=
ps
> with all nodes in your cluster...
> It would be helpful if you provided more information about the errors you=
're
> seeing preferably with debug level logging turned on.
>
>>
>> > 2) if a client keeps connecting to and operating at (issue get and
>> > update) the cluster, when the seed is down, the working client will
>> > throw exception upon the next operation
>>
>> I know Hector supports transparent failover to another Cassandra node.
>> =A0Perhaps Pelops does not.
>
> Pelops will validate connections at a configurable period (60 seconds by
> default) and remove them from the pool. =A0Pelops will also retry the
> operation three times (configurable) against a different node in the pool
> each time.
> If you want Pelops to take more agressive actions when it detects downed
> nodes then check out
> org.scale7.cassandra.pelops.pool.CommonsBackedPool.INodeSuspensionStrateg=
y.
>
>>
>> > 3) using cassandra-cli to connect the remaining nodes in the cluster,
>> > "Internal error processing get_range_slices" will happen when querying
>> > column family
>> >> list <cf>;
>>
>> Cassandra always logs the cause of internal errors in system.log, so
>> you should look there.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>