From cassandra-user-return-1508-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Tue Nov 24 17:19:37 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 26206 invoked from network); 24 Nov 2009 17:19:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Nov 2009 17:19:37 -0000 Received: (qmail 86746 invoked by uid 500); 24 Nov 2009 17:19:37 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 86733 invoked by uid 500); 24 Nov 2009 17:19:37 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 86724 invoked by uid 99); 24 Nov 2009 17:19:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2009 17:19:37 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates 207.188.23.6 as permitted sender) Received: from [207.188.23.6] (HELO jor-el.real.com) (207.188.23.6) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2009 17:19:34 +0000 Received: from [172.21.131.225] ([::ffff:172.21.131.225]) (TLS: TLSv1/SSLv3,256bits,AES256-SHA) by jor-el.real.com with esmtp; Tue, 24 Nov 2009 09:19:14 -0800 id 000940F1.4B0C1592.00002D3B Subject: Re: ring state out of sync in build 883477 From: "B. Todd Burruss" To: cassandra-user@incubator.apache.org In-Reply-To: References: <1259023124.2351.11.camel@btoddb-laptop> Content-Type: text/plain; charset="UTF-8" Date: Tue, 24 Nov 2009 09:19:14 -0800 Message-ID: <1259083154.2351.12.camel@btoddb-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit they all were restarted at various times. for vmguest85 the other three are seed nodes. On Mon, 2009-11-23 at 19:21 -0600, Jonathan Ellis wrote: > So vmquest85 was restarted, but gen-app02 hasn't told it that there > are 2 other nodes that are down? > > Which one is the seed node? > > On Mon, Nov 23, 2009 at 6:38 PM, B. Todd Burruss wrote: > > i'm observing the following on a cluster that started with 4 nodes. i have > > been killing and restarting the various nodes as i test cassandra and now > > i'm seeing a lot of NotFoundException exceptions in the client because what > > i believe is ring state out of sync between the two nodes that are still up > > and available. The first ring state shown below reflects the current state > > of the cluster. Also I have seen similar issues when one of the nodes > > thinks another node is still available when in fact it has been killed. it > > seems to be related to bringing up, killing nodes too fast and not letting > > them figure out when a node is "dead". in this case i see TimedOutException > > related to NIO SocketChannel class. > > > > thx! > > > > [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080 > > ring > > Address Status Load > > Range Ring > > > > 144038903974614862325597275257769797985 > > 172.27.128.186Down 22.17 MB > > 31124469348629903091013930339840898757 |<--| > > 172.27.128.23 Down 22.17 MB > > 64378740291415296162944450043143967518 | | > > 172.27.128.22 Up 22.17 MB > > 121134220722269938669001112695509564769 | | > > 172.27.128.185Up 14.69 MB > > 144038903974614862325597275257769797985 |-->| > > > > [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080 > > ring > > Address Status Load > > Range Ring > > > > 144038903974614862325597275257769797985 > > 172.27.128.22 Up 22.17 MB > > 121134220722269938669001112695509564769 |<--| > > 172.27.128.185Up 14.69 MB > > 144038903974614862325597275257769797985 |-->| > > [cassandra.883477]$ > > > > > >