cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: New nodes won't bootstrap on .66
Date Thu, 28 Oct 2010 10:44:14 GMT
The best approach is to manually select the tokens, see the Load Balancing section http://wiki.apache.org/cassandra/Operations
Also 

Are there any log messages in the existing nodes or the new one which mention each other?


Is this a production system? Is it still running ? 

Sorry there is not a lot to go on, it sounds like you've done the right thing. I'm assuming
things like the Cluster Name, seed list and port numbers are set correct as the new node got
some data.

You'll need to dig through the logs a bit more to see that the boot strapping started and
what was the last message it logged. 

Good Luck. 
Aaron

On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote:

> Hi Aaron,
> Thanks for your reply.
> 
> We still haven't solved this unfortunately. 
> 
> How did you start the bootstrap for the .18 node ? 
> 
> Standard way: we set "AutoBootstrap" to true and added all the servers from the working
ring as seeds.  
>  
> Was it the .18 or the .17 node you tried to add
> 
> We first tried adding .17, it streamed for a while, took on a 50GB of load, stopped streaming
but then didn't enter into the ring.  We left it for a few days to see if it would come in,
but no luck.  After that we did  decommission and  removeToken ( in that order) operations.

> Since we couldn't get .17 in we tried again with .18.  Before doing so we increased the
RpcTimeoutInMillis from 1000, to 10000 having read that this may cause the problem of nodes
not entering into the ring.   It's been going since friday and still, like .17, won't come
into the ring.
> 
> Does it have a token in the config or did you use nodetool move to set it
> No we didn't manually set the token in the config, rather we were relaying on the token
to be assigned durring bootstrap from the RandomPartitioner.
> 
> Again thanks for the help.
> 
> Dimitry.
>   
> 
>  
> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
> Dimitry, Did you get anywhere with this ? 
> 
> Was it the .18 or the .17 node you tried to add ? How did you start the bootstrap for
the .18 node ? Does it have a token in the config or did you use nodetool move to set it?

> 
> I had a quick look at the code AKAIK  the message about removing the fat client is logged
when the node does not have a record of the token the other node as. 
> 
> Aaron
> 
> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimitry@reviewpro.com> wrote:
> 
>> Hi All,
>> We recently upgraded from .65 to .66 after which we tried adding a new node to our
cluster. We left it bootstrapping and after 3 days, it still refused to join the ring. The
strange thing is that nodetool info shows 50GB of load and nodetool ring shows that it sees
the rest of ring, which it is not part of. We tried the process again with another server
-- again the same thing as before:
>> 
>> 
>> //from machine 192.168.218
>> 
>> 
>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info
>> 131373516047318302934572185119435768941
>> Load : 52.85 GB
>> Generation No : 1287761987
>> Uptime (seconds) : 323157
>> Heap Memory (MB) : 795.42 / 1945.63
>> 
>> 
>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring
>> Address Status Load Range Ring
>> 158573510920250391466717289405976537674 
>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--|
>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | |
>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | |
>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->|
>> 
>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>> 
>> 
>> Whats more, while looking at the log of one of the nodes I see gossip messages from
192.168.1.17 -- the first node we tried to add to the cluster but which is not running at
the the time of the log message:
>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) FatClient /192.168.2.17
has been silent for 3600000ms, removing from gossip
>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node /192.168.2.17
is now part of the cluster
>> 
>> 
>> Thanks in advance for the help,
>> Dimitry
> 
> 
> 
> -- 
> Dimitry Lvovsky
> Director of Engineering
> ReviewPro
> www.reviewpro.com
> +34 616 337 103


Mime
View raw message