From user-return-33225-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Apr 4 23:12:08 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 240B1F4ED for ; Thu, 4 Apr 2013 23:12:08 +0000 (UTC) Received: (qmail 20627 invoked by uid 500); 4 Apr 2013 23:12:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20558 invoked by uid 500); 4 Apr 2013 23:12:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20549 invoked by uid 99); 4 Apr 2013 23:12:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 23:12:05 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of asf11@outlook.com designates 65.54.190.158 as permitted sender) Received: from [65.54.190.158] (HELO bay0-omc3-s20.bay0.hotmail.com) (65.54.190.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 23:11:57 +0000 Received: from BAY176-W30 ([65.54.190.187]) by bay0-omc3-s20.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 4 Apr 2013 16:11:36 -0700 X-EIP: [cdgIuZKUiZJItWYsJxHDyfdW1pp6py79] X-Originating-Email: [asf11@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_3cd9eb33-6704-4526-92d1-eca582326a5b_" From: S C To: "user@cassandra.apache.org" Subject: gossip not working Date: Thu, 4 Apr 2013 18:11:36 -0500 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 04 Apr 2013 23:11:36.0539 (UTC) FILETIME=[C16782B0:01CE3189] X-Virus-Checked: Checked by ClamAV on apache.org --_3cd9eb33-6704-4526-92d1-eca582326a5b_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I was in the middle of upgrade to 1.1.9. I brought one node with 1.1.9 whil= e the other were running on 1.1.5. Once one of the node was on 1.1.9 it is = no longer recognizing other nodes in the ring. On 192.168.56.10 and 11 192.168.56.10 DC1-Cass RAC1 Up Normal 28.06 GB 50.00= % 0 192.168.56.11 D= C1-Cass RAC1 Up Normal 31.59 GB 25.00% 4= 2535295865117307932921825928971026432 192.168.56.12 DC1-Cass RAC1 = Down Normal 29.02 GB 25.00% 85070591730234615= 865843651857942052864 =20 On 192.168.56.12 192.168.56.10 DC1-Cass RAC1 Down Normal 28.06 GB 50.= 00% 0 192.168.56.11 = DC1-Cass RAC1 Down Normal 31.59 GB 25.00% = 42535295865117307932921825928971026432 192.168.56.12 DC1-Cass R= AC1 Up Normal 29.02 GB 25.00% 850705917302346= 15865843651857942052864 =20 I do not see anything in the logs that tells me that there is a gossip issu= e. nodetool infoToken : 85070591730234615865843651857942052864Gossi= p active : trueThrift active : trueLoad : 29.05 GBGenerat= ion No : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 79= 45.94Exceptions : 0Key Cache : size 2208 (bytes)=2C capacity 1= 04857584 (bytes)=2C 1056 hits=2C 1099 requests=2C 0.961 recent hit rate=2C = 14400 save period in secondsRow Cache : size 0 (bytes)=2C capacity 0= (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 save period in= seconds nodetool infoToken : 42535295865117307932921825928971026432Gossi= p active : trueThrift active : trueLoad : 31.59 GBGenerat= ion No : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / = 7945.94Exceptions : 1Key Cache : size 3693312 (bytes)=2C capac= ity 104857584 (bytes)=2C 26071678 hits=2C 26616282 requests=2C 0.980 recent= hit rate=2C 14400 save period in secondsRow Cache : size 0 (bytes)= =2C capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 = save period in seconds There is no firewall between the nodes and I can reach each other on storag= e port. What else should I be looking at to find root cause? Appreciate you= r inputs. = --_3cd9eb33-6704-4526-92d1-eca582326a5b_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I was in the middle of upgrade t= o 1.1.9. I brought one node with 1.1.9 while the other were running on 1.1.= 5. Once one of the node was on 1.1.9 it is no longer recognizing other node= s in the ring.

On 192.168.56.10 and 11

192.168.56.10  =3BDC1-Cass  =3B  =3BRAC1 &nbs= p=3B  =3B  =3B  =3BUp  =3B  =3B Normal  =3B28.06 GB=  =3B  =3B  =3B  =3B50.00%  =3B  =3B  =3B  = =3B  =3B  =3B  =3B0  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B&nb= sp=3B
192.168.56.11  =3BDC1-Cass  =3B  =3BRAC1  = =3B  =3B  =3B  =3BUp  =3B  =3B Normal  =3B31.59 GB =  =3B  =3B  =3B  =3B25.00%  =3B  =3B  =3B  = =3B  =3B  =3B  =3B42535295865117307932921825928971026432  = =3B  =3B  =3B
192.168.56.12  =3BDC1-Cass  =3B &nb= sp=3BRAC1  =3B  =3B  =3B  =3BDown  =3B Normal  =3B2= 9.02 GB  =3B  =3B  =3B  =3B25.00%  =3B  =3B  = =3B  =3B  =3B  =3B  =3B850705917302346158658436518579420528= 64  =3B  =3B


On 192.168.56.= 12

192.168.56.10  =3BDC1-Cass  =3B  = =3BRAC1  =3B  =3B  =3B  =3BDown  =3B  =3B Normal &n= bsp=3B28.06 GB  =3B  =3B  =3B  =3B50.00%  =3B  =3B =  =3B  =3B  =3B  =3B  =3B0  =3B  =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B =3B
192.168.56.11  =3BDC1-Cass  =3B  = =3BRAC1  =3B  =3B  =3B  =3BDown  =3B  =3B Normal &n= bsp=3B31.59 GB  =3B  =3B  =3B  =3B25.00%  =3B  =3B =  =3B  =3B  =3B  =3B  =3B4253529586511730793292182592897= 1026432  =3B  =3B  =3B
192.168.56.12  =3BDC1-Cass=  =3B  =3BRAC1  =3B  =3B  =3B  =3BUp  =3B Norma= l  =3B29.02 GB  =3B  =3B  =3B  =3B25.00%  =3B  = =3B  =3B  =3B  =3B  =3B  =3B850705917302346158658436518= 57942052864  =3B  =3B


I do not see anything in the logs that tells me that there is a gossip iss= ue.

nodetool info
Token  =3B &n= bsp=3B  =3B  =3B  =3B  =3B: 8507059173023461586584365185794= 2052864
Gossip active  =3B  =3B: true
Thrift ac= tive  =3B  =3B: true
Load  =3B  =3B  =3B &nbs= p=3B  =3B  =3B : 29.05 GB
Generation No  =3B  =3B= : 1365114563
Uptime (seconds) : 2127
Heap Memory (MB) := 848.71 / 7945.94
Exceptions  =3B  =3B  =3B : 0
=
Key Cache  =3B  =3B  =3B  =3B: size 2208 (bytes)=2C ca= pacity 104857584 (bytes)=2C 1056 hits=2C 1099 requests=2C 0.961 recent hit = rate=2C 14400 save period in seconds
Row Cache  =3B  =3B =  =3B  =3B: size 0 (bytes)=2C capacity 0 (bytes)=2C 0 hits=2C 0 requ= ests=2C NaN recent hit rate=2C 0 save period in seconds
nodetool info
Token  =3B  =3B  =3B &= nbsp=3B  =3B  =3B: 42535295865117307932921825928971026432
Gossip active  =3B  =3B: true
Thrift active  =3B &nb= sp=3B: true
Load  =3B  =3B  =3B  =3B  =3B &nb= sp=3B : 31.59 GB
Generation No  =3B  =3B: 1364413038
Uptime (seconds) : 703904
Heap Memory (MB) : 733.02 / 7945.= 94
Exceptions  =3B  =3B  =3B : 1
Key Cache =  =3B  =3B  =3B  =3B: size 3693312 (bytes)=2C capacity 10485= 7584 (bytes)=2C 26071678 hits=2C 26616282 requests=2C 0.980 recent hit rate= =2C 14400 save period in seconds
Row Cache  =3B  =3B &nbs= p=3B  =3B: size 0 (bytes)=2C capacity 0 (bytes)=2C 0 hits=2C 0 requests= =2C NaN recent hit rate=2C 0 save period in seconds



There is no firewall between the nodes and I can reac= h each other on storage port. =3B
What else should I be looki= ng at to find root cause? Appreciate your inputs.
= --_3cd9eb33-6704-4526-92d1-eca582326a5b_--