Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 19710 invoked from network); 8 Feb 2011 03:07:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Feb 2011 03:07:39 -0000 Received: (qmail 99614 invoked by uid 500); 8 Feb 2011 03:07:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99552 invoked by uid 500); 8 Feb 2011 03:07:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99544 invoked by uid 99); 8 Feb 2011 03:07:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 03:07:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ywtsang@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 03:07:29 +0000 Received: by iwc10 with SMTP id 10so5803502iwc.31 for ; Mon, 07 Feb 2011 19:07:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=3HA0QK2WZ1uV5Z6/TU1FRI+yOfPcM4Mbcvb24zmzZi4=; b=ctYVa486mM9UbRzVqqd/RwrvdkvYbvqZrKhfKAG1eNqLLEgz1etX12QcZTXbyJUHYH WNs8cX3alDUAoR3BXd9EeLLxVcHxnPv9WyMwpJoDNDJQG+KeGUdEjedeD/sXe3z/Xb6i fkR0iew4sD2aKFxeV6ZZvHwPMryywvrhtpMrY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=E2B+KrW+WP6njH3XfVqRWXZpfTb8ow4HH7lzonPmTiSZmO+Y08GAiW36f6YPMpk9Yk QLYnyOFqY9Eok4KgdtQ1/WmERml3g9IsilRpc+oe98KiAF/X17mhHnPphxHxWccZ3z6Y lUPHtlSKyeT6yPtzOWJI/eK6p2gS9ytm/p40k= MIME-Version: 1.0 Received: by 10.231.11.71 with SMTP id s7mr18479913ibs.86.1297134427788; Mon, 07 Feb 2011 19:07:07 -0800 (PST) Received: by 10.231.199.212 with HTTP; Mon, 7 Feb 2011 19:07:07 -0800 (PST) In-Reply-To: References: Date: Tue, 8 Feb 2011 11:07:07 +0800 Message-ID: Subject: Re: seed node failure crash the whole cluster From: TSANG Yiu Wing To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable i will continue the issue here: http://groups.google.com/group/scale7/browse_thread/thread/dd74f1d6265ae2e7 thanks On Tue, Feb 8, 2011 at 7:44 AM, Dan Washusen wrote: > Hi, > I've added some comments and questions inline. > > Cheers, > Dan > On 8 February 2011 10:00, Jonathan Ellis wrote: >> >> On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing wrote= : >> > cassandra version: 0.7 >> > >> > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT >> > >> > cluster: 3 machines (A, B, C) >> > >> > details: >> > it works perfectly when all 3 machines are up and running >> > >> > but if the seed machine is down, the problems happen: >> > >> > 1) new client connection cannot be established >> >> sounds like pelops relies on the seed node to introduce it to the >> cluster. =A0you should configure it either with a hardcoded list of >> nodes or use something like RRDNS instead. =A0I don't use pelops so I >> can't help other than that. =A0(I believe there is a mailing list for >> Pelops though.) > > When dynamic node discovery is turned on (off by default) it doesn't > (shouldn't)=A0rely=A0on the initial seed node once past=A0initialization.= =A0So > either make sure you have=A0dynamic node discovery turned on or seed Pelo= ps > with all nodes in your cluster... > It would be helpful if you provided more information about the errors you= 're > seeing preferably with debug level logging turned on. > >> >> > 2) if a client keeps connecting to and operating at (issue get and >> > update) the cluster, when the seed is down, the working client will >> > throw exception upon the next operation >> >> I know Hector supports transparent failover to another Cassandra node. >> =A0Perhaps Pelops does not. > > Pelops will validate connections at a configurable period (60 seconds by > default) and remove them from the pool. =A0Pelops will also retry the > operation three times (configurable) against a different node in the pool > each time. > If you want Pelops to take more agressive actions when it detects downed > nodes then check out > org.scale7.cassandra.pelops.pool.CommonsBackedPool.INodeSuspensionStrateg= y. > >> >> > 3) using cassandra-cli to connect the remaining nodes in the cluster, >> > "Internal error processing get_range_slices" will happen when querying >> > column family >> >> list ; >> >> Cassandra always logs the cause of internal errors in system.log, so >> you should look there. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > >