Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 518A1F4B5 for ; Fri, 5 Apr 2013 04:09:02 +0000 (UTC) Received: (qmail 18460 invoked by uid 500); 5 Apr 2013 04:08:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 18366 invoked by uid 500); 5 Apr 2013 04:08:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 18340 invoked by uid 99); 5 Apr 2013 04:08:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 04:08:59 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of asf11@outlook.com designates 65.54.190.162 as permitted sender) Received: from [65.54.190.162] (HELO bay0-omc3-s24.bay0.hotmail.com) (65.54.190.162) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 04:08:52 +0000 Received: from BAY176-W13 ([65.54.190.187]) by bay0-omc3-s24.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 4 Apr 2013 21:08:30 -0700 X-EIP: [lvbXEhBIrEM51f7B2zFwzRTJAZcN8u6O] X-Originating-Email: [asf11@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_d277c3c2-bc86-4362-8eda-b2beaee03b98_" From: S C To: "user@cassandra.apache.org" Subject: RE: gossip not working Date: Thu, 4 Apr 2013 23:08:29 -0500 Importance: Normal In-Reply-To: References: ,, MIME-Version: 1.0 X-OriginalArrivalTime: 05 Apr 2013 04:08:30.0352 (UTC) FILETIME=[3B43A900:01CE31B3] X-Virus-Checked: Checked by ClamAV on apache.org --_d277c3c2-bc86-4362-8eda-b2beaee03b98_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Is there a way to force gossip among the nodes? From: asf11@outlook.com To: user@cassandra.apache.org Subject: RE: gossip not working Date: Thu=2C 4 Apr 2013 19:59:45 -0500 =0A= =0A= =0A= I am not seeing anything in the logs other than "Starting up server gossip"= and there is no firewall between the nodes. From: paulsudol@gmail.com Subject: Re: gossip not working Date: Thu=2C 4 Apr 2013 18:49:29 -0500 To: user@cassandra.apache.org What errors are you seeing in the log files of the down nodes? Did you run = upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.= 1.9 On Apr 4=2C 2013=2C at 6:11 PM=2C S C wrote:I was in th= e middle of upgrade to 1.1.9. I brought one node with 1.1.9 while the other= were running on 1.1.5. Once one of the node was on 1.1.9 it is no longer r= ecognizing other nodes in the ring. On 192.168.56.10 and 11 192.168.56.10 DC1-Cass RAC1 Up Normal 28.06 GB 50.00= % 0 192.168.56.11 D= C1-Cass RAC1 Up Normal 31.59 GB 25.00% 4= 2535295865117307932921825928971026432 192.168.56.12 DC1-Cass RAC1 = Down Normal 29.02 GB 25.00% 85070591730234615= 865843651857942052864 =20 On 192.168.56.12 192.168.56.10 DC1-Cass RAC1 Down Normal 28.06 GB 50.= 00% 0 192.168.56.11 = DC1-Cass RAC1 Down Normal 31.59 GB 25.00% = 42535295865117307932921825928971026432 192.168.56.12 DC1-Cass R= AC1 Up Normal 29.02 GB 25.00% 850705917302346= 15865843651857942052864 =20 I do not see anything in the logs that tells me that there is a gossip issu= e. nodetool infoToken : 85070591730234615865843651857942052864Gossi= p active : trueThrift active : trueLoad : 29.05 GBGenerat= ion No : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 79= 45.94Exceptions : 0Key Cache : size 2208 (bytes)=2C capacity 1= 04857584 (bytes)=2C 1056 hits=2C 1099 requests=2C 0.961 recent hit rate=2C = 14400 save period in secondsRow Cache : size 0 (bytes)=2C capacity 0= (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 save period in= seconds nodetool infoToken : 42535295865117307932921825928971026432Gossi= p active : trueThrift active : trueLoad : 31.59 GBGenerat= ion No : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / = 7945.94Exceptions : 1Key Cache : size 3693312 (bytes)=2C capac= ity 104857584 (bytes)=2C 26071678 hits=2C 26616282 requests=2C 0.980 recent= hit rate=2C 14400 save period in secondsRow Cache : size 0 (bytes)= =2C capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 = save period in seconds There is no firewall between the nodes and I can reach each other on storag= e port. What else should I be looking at to find root cause? Appreciate you= r inputs. = --_d277c3c2-bc86-4362-8eda-b2beaee03b98_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Is there a way to force gossip a= mong the nodes?


From: asf11@outlook.com
To: user@cassandra.apache.org
= Subject: RE: gossip not working
Date: Thu=2C 4 Apr 2013 19:59:45 -0500
=0A= =0A= =0A=
I am not seeing anything in the logs other than "Starting = up server gossip" =3B
and there is no firewall between the nodes.

= From: paulsudol@gmail.com
Subject: Re: gossip not working
Date: Thu= =2C 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apache.org

What = errors are you seeing in the log files of the down nodes? Did you run upgra= desstables? You need to upgradesstables when moving from <=3B 1.1.7 to 1.= 1.9

On Apr 4=2C 2013=2C at 6:11 PM=2C S C <=3Basf11@outlook.com>=3B wrote:

I was in the middle of upgrade to 1.1.9. I brought one node with 1= .1.9 while the other were running on 1.1.5. Once one of the node was on 1.1= .9 it is no longer recognizing other nodes in the ring.

=
On 192.168.56.10 and 11

192.168.56.10 &n= bsp=3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  =3B  =3BUp =  =3B  =3B Normal  =3B28.06 GB  =3B  =3B  =3B  = =3B50.00%  =3B  =3B  =3B  =3B  =3B  =3B  =3B0 &= nbsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nbs= p=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B =3B
192.168.56.11 &nb= sp=3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  =3B  =3BUp &= nbsp=3B  =3B Normal  =3B31.59 GB  =3B  =3B  =3B  = =3B25.00%  =3B  =3B  =3B  =3B  =3B  =3B  =3B425= 35295865117307932921825928971026432  =3B  =3B  =3B
19= 2.168.56.12  =3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  = =3B  =3BDown  =3B Normal  =3B29.02 GB  =3B  =3B  = =3B  =3B25.00%  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B85070591730234615865843651857942052864  =3B  =3B

On 192.168.56.12

192.1= 68.56.10  =3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  =3B =  =3BDown  =3B  =3B Normal  =3B28.06 GB  =3B  =3B &n= bsp=3B  =3B50.00%  =3B  =3B  =3B  =3B  =3B  =3B=  =3B0  =3B  =3B  =3B  =3B  =3B  =3B  =3B &= nbsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nbs= p=3B  =3B  =3B  =3B  =3B  =3B =3B
192.168= .56.11  =3BDC1-Cass  =3B  =3BRAC1  =3B  =3B  =3B &n= bsp=3BDown  =3B  =3B Normal  =3B31.59 GB  =3B  =3B &nbs= p=3B  =3B25.00%  =3B  =3B  =3B  =3B  =3B  =3B &= nbsp=3B42535295865117307932921825928971026432  =3B  =3B  =3B
192.168.56.12  =3BDC1-Cass  =3B  =3BRAC1  =3B  = =3B  =3B  =3BUp  =3B Normal  =3B29.02 GB  =3B  =3B =  =3B  =3B25.00%  =3B  =3B  =3B  =3B  =3B  = =3B  =3B85070591730234615865843651857942052864  =3B  =3B
<= /div>


I do not see anything in the logs t= hat tells me that there is a gossip issue.

no= detool info
Token  =3B  =3B  =3B  =3B  =3B &n= bsp=3B: 85070591730234615865843651857942052864
Gossip active &nbs= p=3B  =3B: true
Thrift active  =3B  =3B: true
Load  =3B  =3B  =3B  =3B  =3B  =3B : 29.05 GB
Generation No  =3B  =3B: 1365114563
Uptime (secon= ds) : 2127
Heap Memory (MB) : 848.71 / 7945.94
Exceptio= ns  =3B  =3B  =3B : 0
Key Cache  =3B  =3B &nb= sp=3B  =3B: size 2208 (bytes)=2C capacity 104857584 (bytes)=2C 1056 hit= s=2C 1099 requests=2C 0.961 recent hit rate=2C 14400 save period in seconds=
Row Cache  =3B  =3B  =3B  =3B: size 0 (bytes)=2C= capacity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 sav= e period in seconds

nodetool info
=
Token  =3B  =3B  =3B  =3B  =3B  =3B: 425352958= 65117307932921825928971026432
Gossip active  =3B  =3B: tr= ue
Thrift active  =3B  =3B: true
Load  =3B =  =3B  =3B  =3B  =3B  =3B : 31.59 GB
Generatio= n No  =3B  =3B: 1364413038
Uptime (seconds) : 703904
Heap Memory (MB) : 733.02 / 7945.94
Exceptions  =3B &nb= sp=3B  =3B : 1
Key Cache  =3B  =3B  =3B  =3B:= size 3693312 (bytes)=2C capacity 104857584 (bytes)=2C 26071678 hits=2C 266= 16282 requests=2C 0.980 recent hit rate=2C 14400 save period in seconds
Row Cache  =3B  =3B  =3B  =3B: size 0 (bytes)=2C cap= acity 0 (bytes)=2C 0 hits=2C 0 requests=2C NaN recent hit rate=2C 0 save pe= riod in seconds



There is no fi= rewall between the nodes and I can reach each other on storage port. = =3B
What else should I be looking at to find root cause? Apprecia= te your inputs.

=
= --_d277c3c2-bc86-4362-8eda-b2beaee03b98_--