incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Canaleta <mcanal...@gmail.com>
Subject Re: New nodes won't bootstrap on .66
Date Sun, 07 Nov 2010 17:57:12 GMT
Hi,

Did you solve this problem? I'm having the same poblem. I'm trying to
bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
and KeyspaceLogs, both with replication factor 2.

It starts bootstrapping, receives some streams but it keeps waiting for
streams. I enabled the debug mode. This lines may be useful:

DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning
bootstrap process
DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added /
10.204.93.16/Keyspace1 as a bootstrap source
...
DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added /
10.204.93.16/KeyspaceLogs as a bootstrap source
... (streaming mesages)
DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
10.204.93.16]
...
(and never ends).

It seems it is waiting for  [/10.204.93.16] when it should be waiting for /
10.204.93.16/KeyspaceLogs.

The third node is 64 bits, while the two existing nodes are 32 bits. Can
this be a problem?

Thank you.


2010/10/28 Dimitry Lvovsky <dimitry@reviewpro.com>

> Maybe your    <StoragePort>7000</StoragePort> is being blocked by iptables
> or some firewall or maybe you have it bound (<ListenAddress> tag )  to
> localhost instead an ip address.
>
> Hope this helps,
> Dimitry.
>
>
>
> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
> thibaut.britz@trendiction.com> wrote:
>
>> Hi,
>>
>> I have the same problem with 0.6.5
>>
>> New nodes will hang forever in bootstrap mode (no streams are being
>> opened) and the receiver thread just waits for data forever:
>>
>>
>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>> Sampling index for /hd2/cassandra/data/table_xyz/
>> table_xyz-3-Data.db
>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>
>> Stacktracke:
>>
>> "pool-1-thread-53" prio=10 tid=0x00000000412f2800 nid=0x215c runnable
>> [0x00007fd7cf217000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.net.SocketInputStream.socketRead0(Native Method)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>         - locked <0x00007fd7e77e0520> (a java.io.BufferedInputStream)
>>         at
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>>         at
>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>         at
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>>         at
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>>         at
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>>         at
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>>         at
>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>
>>> The best approach is to manually select the tokens, see the Load
>>> Balancing section http://wiki.apache.org/cassandra/Operations Also
>>>
>>> Are there any log messages in the existing nodes or the new one which
>>> mention each other?
>>>
>>> Is this a production system? Is it still running ?
>>>
>>> Sorry there is not a lot to go on, it sounds like you've done the right
>>> thing. I'm assuming things like the Cluster Name, seed list and port numbers
>>> are set correct as the new node got some data.
>>>
>>> You'll need to dig through the logs a bit more to see that the boot
>>> strapping started and what was the last message it logged.
>>>
>>> Good Luck.
>>> Aaron
>>>
>>> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote:
>>>
>>> Hi Aaron,
>>> Thanks for your reply.
>>>
>>> We still haven't solved this unfortunately.
>>>
>>>  How did you start the bootstrap for the .18 node ?
>>>
>>>
>>> Standard way: we set "AutoBootstrap" to true and added all the servers
>>> from the working ring as seeds.
>>>
>>>
>>>> Was it the .18 or the .17 node you tried to add
>>>
>>>
>>> We first tried adding .17, it streamed for a while, took on a 50GB of
>>> load, stopped streaming but then didn't enter into the ring.  We left it for
>>> a few days to see if it would come in, but no luck.  After that we did
>>>  decommission and  removeToken ( in that order) operations.
>>> Since we couldn't get .17 in we tried again with .18.  Before doing so we
>>> increased the RpcTimeoutInMillis from 1000, to 10000 having read that this
>>> may cause the problem of nodes not entering into the ring.   It's been going
>>> since friday and still, like .17, won't come into the ring.
>>>
>>> Does it have a token in the config or did you use nodetool move to set it
>>>
>>> No we didn't manually set the token in the config, rather we were
>>> relaying on the token to be assigned durring bootstrap from the
>>> RandomPartitioner.
>>>
>>> Again thanks for the help.
>>>
>>> Dimitry.
>>>
>>>
>>>
>>> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton <aaron@thelastpickle.com>wrote:
>>>
>>>> Dimitry, Did you get anywhere with this ?
>>>>
>>>> Was it the .18 or the .17 node you tried to add ? How did you start the
>>>> bootstrap for the .18 node ? Does it have a token in the config or did you
>>>> use nodetool move to set it?
>>>>
>>>> I had a quick look at the code AKAIK  the message about removing the fat
>>>> client is logged when the node does not have a record of the token the other
>>>> node as.
>>>>
>>>> Aaron
>>>>
>>>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimitry@reviewpro.com>
>>>> wrote:
>>>>
>>>> Hi All,
>>>> We recently upgraded from .65 to .66 after which we tried adding a new
>>>> node to our cluster. We left it bootstrapping and after 3 days, it still
>>>> refused to join the ring. The strange thing is that nodetool info shows 50GB
>>>> of load and nodetool ring shows that it sees the rest of ring, which it is
>>>> not part of. We tried the process again with another server -- again the
>>>> same thing as before:
>>>>
>>>>
>>>> //from machine 192.168.218
>>>>
>>>>
>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info
>>>> 131373516047318302934572185119435768941
>>>> Load : 52.85 GB
>>>> Generation No : 1287761987
>>>> Uptime (seconds) : 323157
>>>> Heap Memory (MB) : 795.42 / 1945.63
>>>>
>>>>
>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring
>>>> Address Status Load Range Ring
>>>> 158573510920250391466717289405976537674
>>>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--|
>>>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | |
>>>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | |
>>>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->|
>>>>
>>>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams
>>>> Mode: Bootstrapping
>>>> Not sending any streams.
>>>> Not receiving any streams.
>>>>
>>>>
>>>> Whats more, while looking at the log of one of the nodes I see gossip
>>>> messages from 192.168.1.17 -- the first node we tried to add to the cluster
>>>> but which is not running at the the time of the log message:
>>>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406)
>>>> FatClient /192.168.2.17 has been silent for 3600000ms, removing from
>>>> gossip
>>>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node /
>>>> 192.168.2.17 is now part of the cluster
>>>>
>>>>
>>>> Thanks in advance for the help,
>>>> Dimitry
>>>>
>>>>
>>>
>>>
>>> --
>>> Dimitry Lvovsky
>>> Director of Engineering
>>> ReviewPro
>>> www.reviewpro.com
>>> +34 616 337 103
>>>
>>>
>>>
>>
>
>
> --
> Dimitry Lvovsky
> Director of Engineering
> ReviewPro
> www.reviewpro.com
> +34 616 337 103
>

Mime
View raw message