incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Solovyov <boris.solov...@gmail.com>
Subject Re: Nodetool doesn't shows two nodes
Date Sun, 17 Feb 2013 21:27:25 GMT
Hi,

I've checked all things Alain suggested and set up a fresh 2-node cluster,
and I still get the same result: each node lists itself as only one.

This time I made the following changes:

   - I set listen_address to the public DNS name. Internally, AWS's DNS
   will map this to the 10.x IP, so this should work correctlly if I
   understand right. These are new EC2 instances, and I did not trust
   configured hostname or so on.
   - I opened all ports between nodes in security group.
   - I kept the snitch at Ec2MultiRegionSnitch. This cluster is small now
   but it will be very large and nationwide if I succeed and choose Cassandra
   for this purpose. Do I right understand that it is not possible to change
   this later, or at least is not easy?
   - I ensured all Alain suggestions, for example cluster_name is same with
   all nodes.
   - I set seed list to public DNS name of first node. This is identical on
   both node.
   - I checked Alain's suggest about auto_bootstrap. Docs say this is not
   needed to set. Is this docs wrong? (I look at DataStax 1.2 PDF docs)

Here is some more debugging evidence. On node 1, the seed,

[root@ip-10-113-19-24 ~]# ifconfig | grep inet.addr
          inet addr:10.113.19.24  Bcast:10.113.19.255  Mask:255.255.254.0
[root@ip-10-113-19-24 ~]# nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID
          Rack
UN  23.22.204.201     20.97 KB   256     100.0%
 4fadd4fd-c57c-4172-95aa-092368ba5743  1a
[root@ip-10-113-19-24 ~]# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address
State       PID/Program name
tcp        0      0 0.0.0.0:7199                0.0.0.0:*
LISTEN      1910/java
tcp        0      0 0.0.0.0:47298               0.0.0.0:*
LISTEN      1910/java
tcp        0      0 0.0.0.0:57030               0.0.0.0:*
LISTEN      1910/java
tcp        0      0 0.0.0.0:9160                0.0.0.0:*
LISTEN      1910/java
tcp        0      0 0.0.0.0:9042                0.0.0.0:*
LISTEN      1910/java
tcp        0      0 0.0.0.0:22                  0.0.0.0:*
LISTEN      1231/sshd
tcp        0      0 10.113.19.24:7000           0.0.0.0:*
LISTEN      1910/java
tcp        0      1 10.113.19.24:38948          54.234.147.60:7000
 SYN_SENT    1910/java
tcp        0      0 10.113.19.24:7000           10.113.19.24:45328
 ESTABLISHED 1910/java
tcp        0      0 10.113.19.24:7000           10.114.205.157:47713
 ESTABLISHED 1910/java
tcp        0      1 10.113.19.24:45597          23.22.204.201:7000
 SYN_SENT    1910/java
tcp        0      0 10.113.19.24:45328          10.113.19.24:7000
ESTABLISHED 1910/java

And in the log,

 INFO 20:58:12,472 Node /23.22.204.201 state jump to normal
 INFO 20:58:12,482 Startup completed! Now serving reads.

Now, this looks similar to the problem before with the private IP addresses
being used some times, public other times. By the way, the other node,
whose internal IP address is 10.114.205.157, is connected to this seed node
as you can see.

I think I could understand this problem if I understand which types of
network connections I should expect to see in the netstat, and what output
I should expect to see in the log. Can someone with more experience tell me
what is wrong/unexpected above? And am I working against Amazon's
architecture by using IPs the way I do?

While I wait for answer, I will shut down, delete all data, and reconfigure
with public IP addresses explicitly and not use DNS names :-) I have a
feeling this is the problem. From within Amazon EC2 server, requesting DNS
for a public DNS name returns the private IP address. (However, I still
feel unsure about what is right way to do this, because I do not know if
Cassandra will use DNS resolve and end up trying to connect to a private IP
that Cassandra is not listening.)

Thanks,
- Boris



On Wed, Feb 13, 2013 at 10:37 AM, Boris Solovyov
<boris.solovyov@gmail.com>wrote:

> Thank you Alain. I will check the things you suggest and report my results.
>
> - Boris
>
>
> On Wed, Feb 13, 2013 at 7:54 AM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>
>> Hi Boris.
>>
>> "I feel like I have made a beginner's mistake"
>> That's an horrible feeling :D. I'll try to help ;)
>>
>> "cluster_name: 'TS'"
>> Are you sure you used the same name for both node ?
>>
>> "I can connect to port 7000"
>> You can check all the ports needed there
>> http://www.datastax.com/docs/1.2/install/install_ami and open them in
>> security group once and for all so you won't be wondering this anymore.
>>
>> "listen_address: 10.145.232.190"
>> "INFO 19:36:32,710 Node /107.22.114.19 state jump to normal"
>> There is "10.145.232.190" defined as listen address and you logs says
>> that 107.22.114.19 joined the ring and your second ip seems to be
>> 23.21.11.193... When you stop an EC2 server, its internal ip may change.
>> So I recommend you not to do so, but restart them instead. Anyway you
>> should use instance stores and not EBS, and Instance Store can't be stopped
>> so you won't have this issue anymore. Don't trust ip-10-145-232-190
>> which is configured at first start in /etc/hostname.
>>
>> "endpoint_snitch: Ec2MultiRegionSnitch"
>> Maybe should you use endpoint_snitch: Ec2Snitch since all your servers
>> are in the same zone. You will have to use privates ip everywhere and
>> comment the broadcast_address if you do so.
>>
>>
>> The first node has to start with auto_bootsrap: false, while the 2nd one
>> could use auto_bootsrap: true. Seeds node must be your first node only, a
>> bootstrapping node mustn't be defined as a seed.
>>
>> "my guess... certainly 30-second timeouts look suspicious"
>> This is not a timeout but rather a sleep and it is a normal wait while
>> adding a node.
>>
>> Since your a new user, I guess you have no data. If you want to try some
>> conf you can always "reset" your cassandra node by removing .../cassandra/*
>> (commitlog, data and saved_caches) after stopping Cassandra.
>>
>> Good luck with this.
>>
>>  Alain
>>
>>
>> 2013/2/12 Boris Solovyov <boris.solovyov@gmail.com>
>>
>>> I've configured 2-node cluster in EC2, key settings as follows:
>>>
>>> cluster_name: 'TS'
>>> num_tokens: 256
>>> seed_provider:
>>>     - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>>       parameters:
>>>           - seeds: "ec2-23-21-11-193.compute-1.amazonaws.com,
>>> ec2-107-22-114-19.compute-1.amazonaws.com"
>>> listen_address: 10.145.232.190
>>> broadcast_address: ec2-23-21-11-193.compute-1.amazonaws.com
>>> rpc_address: 0.0.0.0
>>>  endpoint_snitch: Ec2MultiRegionSnitch
>>>
>>> On other node, it is similar, but of course the listen and broadcast
>>> address are different. Now, when I start Cassandra, I see in the logs
>>>
>>> INFO 19:35:32,348 JOINING: waiting for ring information
>>>
>>> And then after 30 seconds, it says a bunch of things like this:
>>>
>>> JOINING: schema complete, ready to bootstrap
>>> JOINING: getting bootstrap token
>>> Enqueuing flush of Memtable...
>>> JOINING: sleeping 30000 ms for pending range setup
>>> JOINING: Starting to bootstrap...
>>> Bootstrap completed! for the tokens [....]
>>>
>>> Finally, after some more memtable flushing,
>>>
>>> INFO 19:36:32,710 Node /107.22.114.19 state jump to normal
>>> INFO 19:36:32,722 Startup completed! Now serving reads.
>>>
>>> Now, I start the other node, and I see basically the same thing in the
>>> logs.
>>>
>>> Running nodetool status, I see what looks like two single-node clusters!
>>>
>>> [root@ip-10-147-171-160 ~]# nodetool status
>>> Datacenter: us-east
>>> ===================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address           Load       Tokens  Owns   Host ID
>>>               Rack
>>> UN  107.22.114.19     21 KB      256     100.0%
>>>  f7a24bd2-8cb9-499d-806c-d9e548f34b8d  1a
>>>
>>> [root@ip-10-145-232-190 ~]# nodetool status
>>> Datacenter: us-east
>>> ===================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address           Load       Tokens  Owns   Host ID
>>>               Rack
>>> UN  23.21.11.193      21 KB      256     100.0%
>>>  9d70f022-03cf-488a-807d-22e991761483  1a
>>>
>>> It looks to me like nodes didn't communicate with each other like I
>>> thought they would, and timed out waiting for gossip to tell them which
>>> nodes are in the ring (I'm new to Cassandra, but this is my guess...
>>> certainly 30-second timeouts look suspicious). I checked with telnet, and
>>> from each node I can connect to port 7000 on the other node (both on
>>> internal and public IP). I feel like I have made a beginner's mistake.
>>> Anyone has a suggestion where to look next?
>>>
>>> - Boris
>>>
>>
>>
>

Mime
View raw message