cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmond Lau <edm...@ooyala.com>
Subject Re: on bootstrapping a node
Date Thu, 29 Oct 2009 23:42:43 GMT
I'm not able to bootstrap a new node on either 0.4.1 or trunk.  I
started up a simple 2 node cluster with a replication factor of 2 and
then bootstrapped a 3rd (using -b in 0.4.1 and AutoBootstrap in
trunk).

In 0.4.1, I do observe some writes going to the new node as expected,
but then the BOOT-STRAPPER thread throws a NPE and the node never
shows up in nodeprobe ring.  I believe this is fixed in CASSANDRA-425:

DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,272 BootStrapper.java
(line 100) Total number of old ranges 2
DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,274 BootStrapper.java
(line 83) Exception was generated at : 10/29/2009 22:56:41 on thread
BOOT-STRAPPER:1

java.lang.NullPointerException
        at org.apache.cassandra.dht.Range.contains(Range.java:105)
        at org.apache.cassandra.dht.LeaveJoinProtocolHelper.getRangeSplitRangeMapping(LeaveJoinProtocolHelper.java:72)
        at org.apache.cassandra.dht.BootStrapper.getRangesWithSourceTarget(BootStrapper.java:105)
        at org.apache.cassandra.dht.BootStrapper.run(BootStrapper.java:73)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

On trunk, the 3rd node never receives any writes and just sits there
doing nothing.  It also never shows up on the nodeprobe ring:

 INFO [main] 2009-10-29 23:15:24,934 StorageService.java (line 264)
Starting in bootstrap mode (first, sleeping to get load information)
 INFO [GMFD:1] 2009-10-29 23:15:26,423 Gossiper.java (line 634) Node
/172.16.130.130 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:15:26,424 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.130 - has token
129730098012431089662630620415811546756
 INFO [GMFD:1] 2009-10-29 23:15:26,426 Gossiper.java (line 634) Node
/172.16.130.129 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:15:26,426 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.129 - has token
30741330848943310678704865619376516001
DEBUG [Timer-0] 2009-10-29 23:15:26,930 LoadDisseminator.java (line
39) Disseminating load info ...
DEBUG [GMFD:1] 2009-10-29 23:18:39,451 StorageService.java (line 434)
InetAddress /172.16.130.130 just recovered from a partition. Sending
hinted data.
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,454
HintedHandOffManager.java (line 186) Started hinted handoff for
endPoint /172.16.130.130
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,456
HintedHandOffManager.java (line 225) Finished hinted handoff for
endpoint /172.16.130.130
DEBUG [GMFD:1] 2009-10-29 23:18:39,954 StorageService.java (line 434)
InetAddress /172.16.130.129 just recovered from a partition. Sending
hinted data.
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,955
HintedHandOffManager.java (line 186) Started hinted handoff for
endPoint /172.16.130.129
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,956
HintedHandOffManager.java (line 225) Finished hinted handoff for
endpoint /172.16.130.129

Bootstrapping the 3rd node after manually giving it an initial token
led to an AssertionError:

 INFO [main] 2009-10-29 23:25:11,720 SystemTable.java (line 125) Saved
Token not found. Using 0
DEBUG [main] 2009-10-29 23:25:11,878 MessagingService.java (line 203)
Starting to listen on v31.vv.prod.ooyala.com/172.16.130.131
 INFO [main] 2009-10-29 23:25:11,933 StorageService.java (line 264)
Starting in bootstrap mode (first, sleeping to get load information)
 INFO [GMFD:1] 2009-10-29 23:25:13,679 Gossiper.java (line 634) Node
/172.16.130.130 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:25:13,680 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.130 - has token
50846833567878089067494666696176925951
 INFO [GMFD:1] 2009-10-29 23:25:13,682 Gossiper.java (line 634) Node
/172.16.130.129 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:25:13,682 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.129 - has token
44233547425983959380881840716972243602
DEBUG [Timer-0] 2009-10-29 23:25:13,929 LoadDisseminator.java (line
39) Disseminating load info ...
ERROR [main] 2009-10-29 23:25:43,754 CassandraDaemon.java (line 184)
Exception encountered during startup.
java.lang.AssertionError
        at org.apache.cassandra.dht.BootStrapper.<init>(BootStrapper.java:84)
        at org.apache.cassandra.service.StorageService.start(StorageService.java:267)
        at org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:72)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:94)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:166)

Thoughts?

Edmond

On Wed, Oct 28, 2009 at 2:24 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> On Wed, Oct 28, 2009 at 1:15 PM, Edmond Lau <edmond@ooyala.com> wrote:
>> Sounds reasonable.  Until CASSANDRA-435 is complete, there's no way
>> currently to take down a node and have it be removed from the list of
>> nodes that's responsible for the data in its token range, correct?
>> All other nodes will just assume that it's temporarily unavailable?
>
> Right.
>
>> Assume that we had the ability to permanently remove a node.  Would
>> modifying the token on an existing node and restarting it with
>> bootstrapping somehow be incorrect, or merely not performant b/c we'll
>> be performing lazy repair on most reads until the node is up to date?
>
> If you permanently remove a node, wipe its data directory, and restart
> it, it's effectively a new node, so everything works fine.  If you
> don't wipe its data directory it won't bootstrap (and it will ignore a
> new token in the configuration file in favor of the one it stored in
> the system table) since it will say "hey, I must have crashed and
> restarted.  Here I am again guys!"
>
> Bootstrap is for new nodes.  Don't try to be too clever. :)
>
>> if I wanted to
>> migrate my cluster to a completely new set of machines.  I would then
>> bootstrap all the new nodes in the new cluster, and then decommission
>> my old nodes one by one (assuming
>> https://issues.apache.org/jira/browse/CASSANDRA-435 was done).  After
>> the migration, all my nodes would've been bootstrapped.
>
> Sure.
>
> -Jonathan
>

Mime
View raw message