From user-return-23892-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Feb 2 08:45:47 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5FC2296B3 for ; Thu, 2 Feb 2012 08:45:47 +0000 (UTC) Received: (qmail 59714 invoked by uid 500); 2 Feb 2012 08:45:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59142 invoked by uid 500); 2 Feb 2012 08:45:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59132 invoked by uid 99); 2 Feb 2012 08:45:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2012 08:45:16 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [217.149.130.193] (HELO mail.eventis.nl) (217.149.130.193) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2012 08:45:10 +0000 Received: from MAIL.office.eventis.nl ([::1]) by MAIL.office.eventis.nl ([::1]) with mapi; Thu, 2 Feb 2012 09:44:47 +0100 From: Rene Kochen To: "user@cassandra.apache.org" Subject: RE: Node down Thread-Topic: Node down Thread-Index: Aczgyf3xVjtzzuyeRoChVD6/o8dl3AASh6EAAByEmgA= Date: Thu, 2 Feb 2012 08:44:45 +0000 Message-ID: <08F4474CF6CB91419BBE546AD49AEA44EA75@MAIL.office.eventis.nl> References: <08F4474CF6CB91419BBE546AD49AEA44E79B@MAIL.office.eventis.nl> <645CB9FD-B5AE-4026-B507-42B6DCBA8D3A@thelastpickle.com> In-Reply-To: <645CB9FD-B5AE-4026-B507-42B6DCBA8D3A@thelastpickle.com> Accept-Language: nl-NL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: multipart/alternative; boundary="_000_08F4474CF6CB91419BBE546AD49AEA44EA75MAILofficeeventisnl_" MIME-Version: 1.0 --_000_08F4474CF6CB91419BBE546AD49AEA44EA75MAILofficeeventisnl_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable A restart of node1 fixed the problem. The only thing I saw in the log of node1 before the problem was the followi= ng: InetAddress /172.27.70.135 is now dead. InetAddress /172.27.70.135 is now UP After this, the nodetool ring command showed node 172.27.70.135 as dead. You mention a "stored ring view". Can it be that this stored ring view was = out of sync with the actual (gossip) situation? Thanks! Rene From: aaron morton [mailto:aaron@thelastpickle.com] Sent: woensdag 1 februari 2012 21:03 To: user@cassandra.apache.org Subject: Re: Node down Without knowing too much more information I would try this... * Restart node each node in turn, watch the logs to see what it says about = the other. * If that restart did not fix it, try using the Dcassandra.load_ring_state= =3Dfalse JVM option when starting the node. That will tell it to ignore it'= s stored ring view and use what gossip is telling it. Add it as a new line = at the bottom of cassandra-env.sh. If it's still failing watch the logs and see what it says when it marks the= other as been down. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/02/2012, at 11:12 PM, Rene Kochen wrote: I have a cluster with seven nodes. If I run the node-tool ring command on all nodes, I see the following: Node1 says that node2 is down. Node 2 says that node1 is down. All other nodes say that everyone is up. Is this normal behavior? I see no network related problems. Also no problems between node1 and node2= . I use Cassandra 0.7.10 Thanks, Rene --_000_08F4474CF6CB91419BBE546AD49AEA44EA75MAILofficeeventisnl_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

A restart of node1 fixed the problem.

 

=

The only thing I saw in the log of= node1 before the problem was the following:

 

InetAddress /172.27.70.135 is now dead.

InetAddress /172.27.7= 0.135 is now UP

 

Aft= er this, the nodetool ring command showed node 172.27.70.135 as dead.<= /o:p>

 <= /span>

You mention a “sto= red ring view”. Can it be that this stored ring view was out of sync = with the actual (gossip) situation?

 

Thanks!

 

Rene

 

From:= aaron morton [mailto:aaron@thelastpickle.com]
Sent:= woensdag 1 februari 2012 21:03
To: user@cassandra.apache.org=
Subject: Re: Node down

 

Without knowing too = much more information I would try this…

 

* Restart= node each node in turn, watch the logs to see what it says about the other= . 

* If that restart did= not fix it, try using the  Dcassandra.load_ring_state=3Dfalse JV= M option when starting the node. That will tell it to ignore it's stored ri= ng view and use what gossip is telling it. Add it as a new line at the bott= om of cassandra-env.sh. 

 

If it's still failing= watch the logs and see what it says when it marks the other as been down.&= nbsp;

 

Cheers

 

 

----------= -------

Aaron M= orton

Freelance= Developer

@aar= onmorton

 

On 1/02/2012, at 11:12 PM= , Rene Kochen wrote:



=

I have a cluster with seven = nodes.

 

If I run the node-= tool ring command on all nodes, I see the following:

<= /div>

 

Node1 says that node2 is down.

Node 2 says that node1 is down.<= /span><= o:p>

All other nodes = say that everyone is up.

 

= Is this normal behavior?

 

= I see no network related problems. Also no problems between node1 and node2= .

I use Cassandr= a 0.7.10

 <= /span><= o:p>

Thanks,

 =

Rene

<= /div>

 

= --_000_08F4474CF6CB91419BBE546AD49AEA44EA75MAILofficeeventisnl_--