Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of boris.solovyov@gmail.com
 designates 209.85.214.194 as permitted sender)
MIME-Version: 1.0
Date: Tue, 12 Feb 2013 14:56:02 -0500
Message-ID: 
 <CADrLAEPXvJx=7jewXB_iwuaKVgkW5P9kfk-e8pqqYpM77pAR2g@mail.gmail.com>
Subject: Nodetool doesn't shows two nodes
From: Boris Solovyov <boris.solovyov@gmail.com>
To: user <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=e89a8fb205484a311804d58c6a5c

--e89a8fb205484a311804d58c6a5c
Content-Type: text/plain; charset=ISO-8859-1

I've configured 2-node cluster in EC2, key settings as follows:

cluster_name: 'TS'
num_tokens: 256
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "ec2-23-21-11-193.compute-1.amazonaws.com,
ec2-107-22-114-19.compute-1.amazonaws.com"
listen_address: 10.145.232.190
broadcast_address: ec2-23-21-11-193.compute-1.amazonaws.com
rpc_address: 0.0.0.0
endpoint_snitch: Ec2MultiRegionSnitch

On other node, it is similar, but of course the listen and broadcast
address are different. Now, when I start Cassandra, I see in the logs

INFO 19:35:32,348 JOINING: waiting for ring information

And then after 30 seconds, it says a bunch of things like this:

JOINING: schema complete, ready to bootstrap
JOINING: getting bootstrap token
Enqueuing flush of Memtable...
JOINING: sleeping 30000 ms for pending range setup
JOINING: Starting to bootstrap...
Bootstrap completed! for the tokens [....]

Finally, after some more memtable flushing,

INFO 19:36:32,710 Node /107.22.114.19 state jump to normal
INFO 19:36:32,722 Startup completed! Now serving reads.

Now, I start the other node, and I see basically the same thing in the logs.

Running nodetool status, I see what looks like two single-node clusters!

[root@ip-10-147-171-160 ~]# nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID
          Rack
UN  107.22.114.19     21 KB      256     100.0%
 f7a24bd2-8cb9-499d-806c-d9e548f34b8d  1a

[root@ip-10-145-232-190 ~]# nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID
          Rack
UN  23.21.11.193      21 KB      256     100.0%
 9d70f022-03cf-488a-807d-22e991761483  1a

It looks to me like nodes didn't communicate with each other like I thought
they would, and timed out waiting for gossip to tell them which nodes are
in the ring (I'm new to Cassandra, but this is my guess... certainly
30-second timeouts look suspicious). I checked with telnet, and from each
node I can connect to port 7000 on the other node (both on internal and
public IP). I feel like I have made a beginner's mistake. Anyone has a
suggestion where to look next?

- Boris

--e89a8fb205484a311804d58c6a5c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;ve configured 2-node cluster in EC2, key settings as=
 follows:<div><br></div><div><div>cluster_name: &#39;TS&#39;</div><div>num_=
tokens: 256</div></div><div style>seed_provider:</div><div>=A0 =A0 - class_=
name: org.apache.cassandra.locator.SimpleSeedProvider</div>
<div>=A0 =A0 =A0 parameters:</div><div style><div>=A0 =A0 =A0 =A0 =A0 - see=
ds: &quot;<a href=3D"http://ec2-23-21-11-193.compute-1.amazonaws.com">ec2-2=
3-21-11-193.compute-1.amazonaws.com</a>,<a href=3D"http://ec2-107-22-114-19=
.compute-1.amazonaws.com">ec2-107-22-114-19.compute-1.amazonaws.com</a>&quo=
t;</div>
<div><div>listen_address: 10.145.232.190</div><div>broadcast_address: <a hr=
ef=3D"http://ec2-23-21-11-193.compute-1.amazonaws.com">ec2-23-21-11-193.com=
pute-1.amazonaws.com</a></div></div><div><div>rpc_address: 0.0.0.0</div></d=
iv>
<div><div>endpoint_snitch: Ec2MultiRegionSnitch</div></div><div><br></div><=
div style>On other node, it is similar, but of course the listen and broadc=
ast address are different. Now, when I start Cassandra, I see in the logs</=
div>
<div style><br></div><div style><div>INFO 19:35:32,348 JOINING: waiting for=
 ring information</div><div><br></div><div style>And then after 30 seconds,=
 it says a bunch of things like this:</div><div style><br></div><div style>
JOINING: schema complete, ready to bootstrap</div><div style>JOINING: getti=
ng bootstrap token</div><div style>Enqueuing flush of Memtable...</div><div=
 style>JOINING: sleeping 30000 ms for pending range setup<br></div><div sty=
le>
JOINING: Starting to bootstrap...<br></div><div style>Bootstrap completed! =
for the tokens [....]<br></div><div style><br></div><div style>Finally, aft=
er some more memtable flushing,</div><div style><br></div><div style><div>
INFO 19:36:32,710 Node /<a href=3D"http://107.22.114.19">107.22.114.19</a> =
state jump to normal</div><div>INFO 19:36:32,722 Startup completed! Now ser=
ving reads.</div><div><br></div><div style>Now, I start the other node, and=
 I see basically the same thing in the logs.</div>
<div style><br></div><div style>Running nodetool status, I see what looks l=
ike two single-node clusters!</div><div style><br></div><div style><div>[ro=
ot@ip-10-147-171-160 ~]# nodetool status</div><div>Datacenter: us-east</div=
>
<div>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div><div>St=
atus=3DUp/Down</div><div>|/ State=3DNormal/Leaving/Joining/Moving</div><div=
>-- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =A0 =A0 Tokens =A0Owns =A0 Host=
 ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Rack</div><=
div>UN =A0107.22.114.19 =A0 =A0 21 KB =A0 =A0 =A0256 =A0 =A0 100.0% =A0f7a2=
4bd2-8cb9-499d-806c-d9e548f34b8d =A01a</div>
<div><br></div><div><div>[root@ip-10-145-232-190 ~]# nodetool status</div><=
div>Datacenter: us-east</div><div>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D</div><div>Status=3DUp/Down</div><div>|/ State=3DNormal/L=
eaving/Joining/Moving</div><div>-- =A0Address =A0 =A0 =A0 =A0 =A0 Load =A0 =
=A0 =A0 Tokens =A0Owns =A0 Host ID =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 Rack</div>
<div>UN =A023.21.11.193 =A0 =A0 =A021 KB =A0 =A0 =A0256 =A0 =A0 100.0% =A09=
d70f022-03cf-488a-807d-22e991761483 =A01a</div></div><div><br></div><div st=
yle>It looks to me like nodes didn&#39;t communicate with each other like I=
 thought they would, and timed out waiting for gossip to tell them which no=
des are in the ring (I&#39;m new to Cassandra, but this is my guess... cert=
ainly 30-second timeouts look suspicious). I checked with telnet, and from =
each node I can connect to port 7000 on the other node (both on internal an=
d public IP). I feel like I have made a beginner&#39;s mistake. Anyone has =
a suggestion where to look next?</div>
<div style><br></div><div style>- Boris</div></div></div></div></div></div>

--e89a8fb205484a311804d58c6a5c--