incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thibaut Britz <thibaut.br...@trendiction.com>
Subject Re: New nodes won't bootstrap on .66
Date Mon, 08 Nov 2010 15:06:07 GMT
Hi,

No I didn't solve the problem. I reinitialized the cluster and gave each
node manually a token before adding data. There are a few messages in
multiple threads related to this, so I suspect it's very common and I hope
it's gone with 0.7.

Thibaut




On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta <mcanaleta@gmail.com> wrote:

> Hi,
>
> Did you solve this problem? I'm having the same poblem. I'm trying to
> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
> and KeyspaceLogs, both with replication factor 2.
>
> It starts bootstrapping, receives some streams but it keeps waiting for
> streams. I enabled the debug mode. This lines may be useful:
>
> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning
> bootstrap process
> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added /
> 10.204.93.16/Keyspace1 as a bootstrap source
> ...
> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added /
> 10.204.93.16/KeyspaceLogs as a bootstrap source
> ... (streaming mesages)
> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
> 10.204.93.16]
> ...
> (and never ends).
>
> It seems it is waiting for  [/10.204.93.16] when it should be waiting for
> /10.204.93.16/KeyspaceLogs.
>
> The third node is 64 bits, while the two existing nodes are 32 bits. Can
> this be a problem?
>
> Thank you.
>
>
> 2010/10/28 Dimitry Lvovsky <dimitry@reviewpro.com>
>
> Maybe your    <StoragePort>7000</StoragePort> is being blocked by iptables
>> or some firewall or maybe you have it bound (<ListenAddress> tag )  to
>> localhost instead an ip address.
>>
>> Hope this helps,
>> Dimitry.
>>
>>
>>
>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
>> thibaut.britz@trendiction.com> wrote:
>>
>>> Hi,
>>>
>>> I have the same problem with 0.6.5
>>>
>>> New nodes will hang forever in bootstrap mode (no streams are being
>>> opened) and the receiver thread just waits for data forever:
>>>
>>>
>>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>>> Sampling index for /hd2/cassandra/data/table_xyz/
>>> table_xyz-3-Data.db
>>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>>
>>> Stacktracke:
>>>
>>> "pool-1-thread-53" prio=10 tid=0x00000000412f2800 nid=0x215c runnable
>>> [0x00007fd7cf217000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at java.net.SocketInputStream.socketRead0(Native Method)
>>>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>         at
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>>         - locked <0x00007fd7e77e0520> (a java.io.BufferedInputStream)
>>>         at
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>>>         at
>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>>         at
>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>>>         at
>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>>>         at
>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>>>         at
>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>>>         at
>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>>
>>>> The best approach is to manually select the tokens, see the Load
>>>> Balancing section http://wiki.apache.org/cassandra/Operations Also
>>>>
>>>> Are there any log messages in the existing nodes or the new one which
>>>> mention each other?
>>>>
>>>> Is this a production system? Is it still running ?
>>>>
>>>> Sorry there is not a lot to go on, it sounds like you've done the right
>>>> thing. I'm assuming things like the Cluster Name, seed list and port numbers
>>>> are set correct as the new node got some data.
>>>>
>>>> You'll need to dig through the logs a bit more to see that the boot
>>>> strapping started and what was the last message it logged.
>>>>
>>>> Good Luck.
>>>> Aaron
>>>>
>>>> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote:
>>>>
>>>> Hi Aaron,
>>>> Thanks for your reply.
>>>>
>>>> We still haven't solved this unfortunately.
>>>>
>>>>  How did you start the bootstrap for the .18 node ?
>>>>
>>>>
>>>> Standard way: we set "AutoBootstrap" to true and added all the servers
>>>> from the working ring as seeds.
>>>>
>>>>
>>>>> Was it the .18 or the .17 node you tried to add
>>>>
>>>>
>>>> We first tried adding .17, it streamed for a while, took on a 50GB of
>>>> load, stopped streaming but then didn't enter into the ring.  We left it
for
>>>> a few days to see if it would come in, but no luck.  After that we did
>>>>  decommission and  removeToken ( in that order) operations.
>>>> Since we couldn't get .17 in we tried again with .18.  Before doing so
>>>> we increased the RpcTimeoutInMillis from 1000, to 10000 having read that
>>>> this may cause the problem of nodes not entering into the ring.   It's been
>>>> going since friday and still, like .17, won't come into the ring.
>>>>
>>>> Does it have a token in the config or did you use nodetool move to set
>>>>> it
>>>>
>>>> No we didn't manually set the token in the config, rather we were
>>>> relaying on the token to be assigned durring bootstrap from the
>>>> RandomPartitioner.
>>>>
>>>> Again thanks for the help.
>>>>
>>>> Dimitry.
>>>>
>>>>
>>>>
>>>> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton <aaron@thelastpickle.com
>>>> > wrote:
>>>>
>>>>> Dimitry, Did you get anywhere with this ?
>>>>>
>>>>> Was it the .18 or the .17 node you tried to add ? How did you start the
>>>>> bootstrap for the .18 node ? Does it have a token in the config or did
you
>>>>> use nodetool move to set it?
>>>>>
>>>>> I had a quick look at the code AKAIK  the message about removing the
>>>>> fat client is logged when the node does not have a record of the token
the
>>>>> other node as.
>>>>>
>>>>> Aaron
>>>>>
>>>>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimitry@reviewpro.com>
>>>>> wrote:
>>>>>
>>>>> Hi All,
>>>>> We recently upgraded from .65 to .66 after which we tried adding a new
>>>>> node to our cluster. We left it bootstrapping and after 3 days, it still
>>>>> refused to join the ring. The strange thing is that nodetool info shows
50GB
>>>>> of load and nodetool ring shows that it sees the rest of ring, which
it is
>>>>> not part of. We tried the process again with another server -- again
the
>>>>> same thing as before:
>>>>>
>>>>>
>>>>> //from machine 192.168.218
>>>>>
>>>>>
>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info
>>>>> 131373516047318302934572185119435768941
>>>>> Load : 52.85 GB
>>>>> Generation No : 1287761987
>>>>> Uptime (seconds) : 323157
>>>>> Heap Memory (MB) : 795.42 / 1945.63
>>>>>
>>>>>
>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring
>>>>> Address Status Load Range Ring
>>>>> 158573510920250391466717289405976537674
>>>>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--|
>>>>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | |
>>>>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | |
>>>>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->|
>>>>>
>>>>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams
>>>>> Mode: Bootstrapping
>>>>> Not sending any streams.
>>>>> Not receiving any streams.
>>>>>
>>>>>
>>>>> Whats more, while looking at the log of one of the nodes I see gossip
>>>>> messages from 192.168.1.17 -- the first node we tried to add to the cluster
>>>>> but which is not running at the the time of the log message:
>>>>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406)
>>>>> FatClient /192.168.2.17 has been silent for 3600000ms, removing from
>>>>> gossip
>>>>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node /
>>>>> 192.168.2.17 is now part of the cluster
>>>>>
>>>>>
>>>>> Thanks in advance for the help,
>>>>> Dimitry
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dimitry Lvovsky
>>>> Director of Engineering
>>>> ReviewPro
>>>> www.reviewpro.com
>>>> +34 616 337 103
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Dimitry Lvovsky
>> Director of Engineering
>> ReviewPro
>> www.reviewpro.com
>> +34 616 337 103
>>
>
>

Mime
View raw message