From user-return-33229-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Apr 5 01:00:14 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F4B21086C for ; Fri, 5 Apr 2013 01:00:14 +0000 (UTC) Received: (qmail 90343 invoked by uid 500); 5 Apr 2013 01:00:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90314 invoked by uid 500); 5 Apr 2013 01:00:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90305 invoked by uid 99); 5 Apr 2013 01:00:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 01:00:11 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of asf11@outlook.com designates 65.54.190.153 as permitted sender) Received: from [65.54.190.153] (HELO bay0-omc3-s15.bay0.hotmail.com) (65.54.190.153) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 01:00:06 +0000 Received: from BAY176-W47 ([65.54.190.187]) by bay0-omc3-s15.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 4 Apr 2013 17:59:44 -0700 X-EIP: [FhJzpZrAdSPxPhZOfcJaE8WI/tUBpvr+] X-Originating-Email: [asf11@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_dfc96e5c-4087-4648-a143-42361dfc7e94_" From: S C To: "user@cassandra.apache.org" Subject: RE: gossip not working Date: Thu, 4 Apr 2013 19:59:45 -0500 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 05 Apr 2013 00:59:44.0963 (UTC) FILETIME=[DCCE7130:01CE3198] X-Virus-Checked: Checked by ClamAV on apache.org --_dfc96e5c-4087-4648-a143-42361dfc7e94_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I am not seeing anything in the logs other than "Starting up server gossip"= and there is no firewall between the nodes. From: paulsudol@gmail.com Subject: Re: gossip not working Date: Thu=2C 4 Apr 2013 18:49:29 -0500 To: user@cassandra.apache.org What errors are you seeing in the log files of the down nodes? Did you run = upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.= 1.9 On Apr 4=2C 2013=2C at 6:11 PM=2C S C wrote:I was in th= e middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other= were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer r= ecognizing other nodes in the ring. On 192.168.56.10 and 11 192.168.56.10 DC1-Cass RAC1 Up Normal 28.06 GB 50.00= % 0 192.168.56.11 D= C1-Cass RAC1 Up Normal 31.59 GB 25.00% 4= 2535295865117307932921825928971026432 192.168.56.12 DC1-Cass RAC1 = Down Normal 29.02 GB 25.00% 85070591730234615= 865843651857942052864 =20 On 192.168.56.12 192.168.56.10 DC1-Cass RAC1 Down Normal 28.06 GB 50.= 00% 0 192.168.56.11 = DC1-Cass RAC1 Down Normal 31.59 GB 25.00% = 42535295865117307932921825928971026432 192.168.56.12 DC1-Cass R= AC1 Up Normal 29.02 GB 25.00% 850705917302346= 15865843651857942052864 =20 I do not see anything in the logs that tells me that there is a gossip issu= e. nodetool infoToken : 85070591730234615865843651857942052864Gossi= p active : trueThrift active : trueLoad : 29.05 GBGenerat= ion No : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 79= 45.94Exceptions : 0Key Cache : size 2208 (bytes)=2C capacity 1= 04857584 (bytes)=2C 1056 hits=2C 1099 requests=2C 0.961 recent hit rate=2C = 14400 save period in secondsRow Cache : size 0 (bytes)=2C capacity 0= (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 save period in= seconds nodetool infoToken : 42535295865117307932921825928971026432Gossi= p active : trueThrift active : trueLoad : 31.59 GBGenerat= ion No : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / = 7945.94Exceptions : 1Key Cache : size 3693312 (bytes)=2C capac= ity 104857584 (bytes)=2C 26071678 hits=2C 26616282 requests=2C 0.980 recent= hit rate=2C 14400 save period in secondsRow Cache : size 0 (bytes)= =2C capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 = save period in seconds There is no firewall between the nodes and I can reach each other on storag= e port. What else should I be looking at to find root cause? Appreciate you= r inputs. = --_dfc96e5c-4087-4648-a143-42361dfc7e94_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I am not seeing anything in the = logs other than "Starting up server gossip" =3B
and there is no fir= ewall between the nodes.

From: paulsudol@gmail.com
Subject: Re: gossip not wo= rking
Date: Thu=2C 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apach= e.org

What errors are you seeing in the log files of the down nodes?= Did you run upgradesstables? You need to upgradesstables when moving from = <=3B 1.1.7 to 1.1.9

On Apr 4=2C 2013=2C at 6:11 PM=2C S= C <=3Basf11@outlook.com>=3B w= rote:

I was in the middle of upgrade to 1.1.9. I brought o= ne node with 1.1.9 while the other were running on 1.1.5. Once one of the n= ode was on 1.1.9 it is no longer recognizing other nodes in the ring.
<= br>
On 192.168.56.10 and 11

19= 2.168.56.10  =3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  = =3B  =3BUp  =3B  =3B Normal  =3B28.06 GB  =3B  =3B =  =3B  =3B50.00%  =3B  =3B  =3B  =3B  =3B  = =3B  =3B0  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B =3B
19= 2.168.56.11  =3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  = =3B  =3BUp  =3B  =3B Normal  =3B31.59 GB  =3B  =3B =  =3B  =3B25.00%  =3B  =3B  =3B  =3B  =3B  = =3B  =3B42535295865117307932921825928971026432  =3B  =3B  = =3B
192.168.56.12  =3BDC1-Cass  =3B  =3BRAC1  =3B=  =3B  =3B  =3BDown  =3B Normal  =3B29.02 GB  =3B &= nbsp=3B  =3B  =3B25.00%  =3B  =3B  =3B  =3B  = =3B  =3B  =3B85070591730234615865843651857942052864  =3B  = =3B


On 192.168.56.12

=
192.168.56.10  =3BDC1-Cass  =3B  =3BRAC1  =3B &n= bsp=3B  =3B  =3BDown  =3B  =3B Normal  =3B28.06 GB &nbs= p=3B  =3B  =3B  =3B50.00%  =3B  =3B  =3B  =3B &= nbsp=3B  =3B  =3B0  =3B  =3B  =3B  =3B  =3B &nb= sp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B =3B<= /div>
192.168.56.11  =3BDC1-Cass  =3B  =3BRAC1  =3B &nb= sp=3B  =3B  =3BDown  =3B  =3B Normal  =3B31.59 GB  = =3B  =3B  =3B  =3B25.00%  =3B  =3B  =3B  =3B &n= bsp=3B  =3B  =3B42535295865117307932921825928971026432  =3B &nb= sp=3B  =3B
192.168.56.12  =3BDC1-Cass  =3B  =3BRA= C1  =3B  =3B  =3B  =3BUp  =3B Normal  =3B29.02 GB &= nbsp=3B  =3B  =3B  =3B25.00%  =3B  =3B  =3B  = =3B  =3B  =3B  =3B85070591730234615865843651857942052864  = =3B  =3B


I do not see any= thing in the logs that tells me that there is a gossip issue.
nodetool info
Token  =3B  =3B  =3B &= nbsp=3B  =3B  =3B: 85070591730234615865843651857942052864
Gossip active  =3B  =3B: true
Thrift active  =3B &nb= sp=3B: true
Load  =3B  =3B  =3B  =3B  =3B &nb= sp=3B : 29.05 GB
Generation No  =3B  =3B: 1365114563
Uptime (seconds) : 2127
Heap Memory (MB) : 848.71 / 7945.94=
Exceptions  =3B  =3B  =3B : 0
Key Cache &n= bsp=3B  =3B  =3B  =3B: size 2208 (bytes)=2C capacity 104857584 = (bytes)=2C 1056 hits=2C 1099 requests=2C 0.961 recent hit rate=2C 14400 sav= e period in seconds
Row Cache  =3B  =3B  =3B  =3B= : size 0 (bytes)=2C capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recen= t hit rate=2C 0 save period in seconds

= nodetool info
Token  =3B  =3B  =3B  =3B  =3B =  =3B: 42535295865117307932921825928971026432
Gossip active &n= bsp=3B  =3B: true
Thrift active  =3B  =3B: true
=
Load  =3B  =3B  =3B  =3B  =3B  =3B : 31.59 GB<= /div>
Generation No  =3B  =3B: 1364413038
Uptime (sec= onds) : 703904
Heap Memory (MB) : 733.02 / 7945.94
Exce= ptions  =3B  =3B  =3B : 1
Key Cache  =3B  =3B=  =3B  =3B: size 3693312 (bytes)=2C capacity 104857584 (bytes)=2C 2= 6071678 hits=2C 26616282 requests=2C 0.980 recent hit rate=2C 14400 save pe= riod in seconds
Row Cache  =3B  =3B  =3B  =3B: si= ze 0 (bytes)=2C capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hi= t rate=2C 0 save period in seconds



=
There is no firewall between the nodes and I can reach each other on s= torage port. =3B
What else should I be looking at to find roo= t cause? Appreciate your inputs.

= --_dfc96e5c-4087-4648-a143-42361dfc7e94_--