Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BC9DDC2D for ; Thu, 18 Oct 2012 14:32:32 +0000 (UTC) Received: (qmail 94324 invoked by uid 500); 18 Oct 2012 14:32:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94202 invoked by uid 500); 18 Oct 2012 14:32:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94180 invoked by uid 99); 18 Oct 2012 14:32:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 14:32:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rene.kochen@schange.com designates 209.85.220.44 as permitted sender) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 14:32:22 +0000 Received: by mail-pa0-f44.google.com with SMTP id fb11so8321842pad.31 for ; Thu, 18 Oct 2012 07:32:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=nveVB0blmiZBRMruNGn+oC+21qDxXEufIeSOMWpvRFg=; b=ljYsCsnyskPXNsolR9xF3jiuUsrL7wVJOSYBQXlSiYbk2a6nSIYufG+iRB2FTKctBk 28zZvt6AXa5VXJX9rdEEEFA1iPM+45oO93wpE7BB0UpJ7MNTQQoTfLxt5NJeQykkBEst CKdFTK301b1OTlvkvS+kJLFvV8V2pBuq2YDPx8eSuIYwKJkIYCRzouoa1hvF6Of4pDgE TXVoBOJqs4U2XxY6ZnpWJEcdmyGQBScuMgDx1gHKVr4/hrZb53tPE6kpCkoAGBYR8KSi 1Jti4UzE1WCmwTP7+oQS1OUNYC0k6MuU2cBqDmg7Mq8oyQqB8uDCxGfgRmtIs1Uz7DDX +mRw== MIME-Version: 1.0 Received: by 10.68.225.199 with SMTP id rm7mr67134293pbc.150.1350570721331; Thu, 18 Oct 2012 07:32:01 -0700 (PDT) Received: by 10.66.165.69 with HTTP; Thu, 18 Oct 2012 07:32:01 -0700 (PDT) In-Reply-To: <3EA900C3-2C4B-4FB2-AC42-6D715811ACB4@thelastpickle.com> References: <3EA900C3-2C4B-4FB2-AC42-6D715811ACB4@thelastpickle.com> Date: Thu, 18 Oct 2012 16:32:01 +0200 Message-ID: Subject: Re: UnreachableNodes From: Rene Kochen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b10cc6b16c8bc04cc564037 X-Gm-Message-State: ALoCoQlLaWms9oGif8fos09p/oYJ2Uuw4FtRSvus80tIPTj3erL9/JchrE4vrr26JFpOUAgJ9Kte X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10cc6b16c8bc04cc564037 Content-Type: text/plain; charset=ISO-8859-1 Thanks Aaron, Telnet works (in both directions). After a normal (i.e. without discarding ring state) restart of the node reporting the other one as down, the ring shows "up" again. So a node restarts fixes the incorrect state. I see this error occasionally. I will further investigate and post more details when it happens again. 2012/10/18 aaron morton > You can double check the node reporting 9.109 as down can telnet to port > 7000 on 9.109. > > Then I would restart 9.109 with -Dcassandra.load_ring_state=false added as > a JVM param in cassandra-env.sh. > > If is still shows as down can you post the output from nodetool gossipinfo > from 9.109 and the node that sees 9.109 as down. > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 18/10/2012, at 8:45 PM, Rene Kochen wrote: > > I have a four node EC2 cluster. > > Three machines show via nodetool ring that all machines are UP. > One machine shows via nodetool ring that one machine is DOWN. > > If I take a closer to the machine reporting the other machine as down, I > see the following: > > - StorageService.UnreachableNodes = 10.49.9.109 > - FailureDetector.SimpleStates: 10.49.9.109 = UP > > So gossip is fine. Actually the whole 10.49.9.109 machine is fine. I see > in the logging that there is communication between 10.49.9.109 and the > machine reporting it as down. > > How or when is a node removed from the UnreachableNodes list and reported > as UP again via nodetool ring? > > I use Cassandra 1.0.11 > > Thanks! > > Rene > > > --047d7b10cc6b16c8bc04cc564037 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Aaron,

Telnet works (in both directions).

After a norm= al (i.e. without discarding ring state) restart of the node reporting the o= ther one as down, the ring shows "up" again. So a node restarts f= ixes the incorrect state.

I see this error occasionally.

I will further investigate and po= st more details when it happens again.

20= 12/10/18 aaron morton <aaron@thelastpickle.com>
You can = double check the node reporting 9.109 as down can telnet to port 7000 on 9.= 109.=A0

Then I would restart 9.109 with -Dcassandra.load_ring_state= =3Dfalse added as a JVM param in cassandra-env.sh.=A0

<= div>If is still shows as down can you post the output from nodetool gossipi= nfo from 9.109 and the node that sees 9.109 as down.=A0

Cheers


<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 18/10/2012, at 8:45 PM, Rene Kochen <rene.kochen@schange.com> = wrote:

I have a four node EC2 cluster.
Three machines show via nodetool ring that all machines are UP.
One = machine shows via nodetool ring that one machine is DOWN.

If I take = a closer to the machine reporting the other machine as down, I see the foll= owing:

- StorageService.UnreachableNodes =3D 10.49.9.109
- FailureDetector.= SimpleStates: 10.49.9.109 =3D UP

So gossip is fine. Actually the who= le 10.49.9.109 machine is fine. I see in the logging that there is communic= ation between 10.49.9.109 and the machine reporting it as down.

How or when is a node removed from the UnreachableNodes list and report= ed as UP again via nodetool ring?

I use Cassandra 1.0.11

Than= ks!

Rene



--047d7b10cc6b16c8bc04cc564037--