Dimitry, Did you get anywhere with this ?
Was it the .18 or the .17 node you tried to add ? How did you start the bootstrap for the .18 node ? Does it have a token in the config or did you use nodetool move to set it?
I had a quick look at the code AKAIK the message about removing the fat client is logged when the node does not have a record of the token the other node as.
On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <email@example.com> wrote:
We recently upgraded from .65 to .66 after which we tried adding a new node to our cluster. We left it bootstrapping and after 3 days, it still refused to join the ring. The strange thing is that nodetool info shows 50GB of load and nodetool ring shows that it sees the rest of ring, which it is not part of. We tried the process again with another server -- again the same thing as before:
//from machine 192.168.218
/opt/cassandra/bin/nodetool -h localhost -p 8999 info
Load : 52.85 GB
Generation No : 1287761987
Uptime (seconds) : 323157
Heap Memory (MB) : 795.42 / 1945.63
/opt/cassandra/bin/nodetool -h localhost -p 8999 ring
Address Status Load Range Ring
192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--|
192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | |
192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | |
192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->|
opt/cassandra/bin/nodetool -h localhost -p 8999 streams
Not sending any streams.
Not receiving any streams.
Whats more, while looking at the log of one of the nodes I see gossip messages from 192.168.1.17 -- the first node we tried to add to the cluster but which is not running at the the time of the log message:
INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) FatClient /192.168.2.17 has been silent for 3600000ms, removing from gossip
INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node /192.168.2.17 is now part of the cluster
Thanks in advance for the help,