incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: problem with bootstrap
Date Tue, 08 Mar 2011 20:16:32 GMT
I've seen this around a couple of times now. 

On reason to fail if there is not enough nodes to meet the replication factor is that CL.ALL
requests cannot be processed. You could make the argument that we can get into that state
at any time is a node is down. But this error is their never been enough nodes in the ring
regardless of their up/down state. So cassandra will never be able to meet the replication
guarantees for the keyspace. E.g. if you kicked off a repair it would not leave the cluster
in the expected state. 

Not sure if this is the official reason, just my thinking. And their may be other reasons.


Sounds like you've made progress though. 

Cheers
Aaron

On 9/03/2011, at 4:23 AM, Patrik Modesto wrote:

> Hi,
> 
> I've small test cluster, 2 servers, both running successfully
> cassandra 0.7.3. I've three keyspaces, two with RF1, one with RF3. Now
> when I try to bootstrap 3rd server (empty initial_token,
> auto_bootstrap: true), I get this exception on the new server.
> 
> INFO 23:13:43,229 Joining: getting bootstrap token
> INFO 23:13:43,258 New token will be
> 127097301048222781806986236020167142093 to assume load from
> /10.0.18.99
> INFO 23:13:43,259 switching in a fresh Memtable for LocationInfo at
> CommitLogContext(file='/mnt/disk8/cassandra/data/CommitLog-1299622332896.log',
> position=1578072)
> INFO 23:13:43,259 Enqueuing flush of
> Memtable-LocationInfo@1526249359(106 bytes, 3 operations)
> INFO 23:13:43,259 Writing Memtable-LocationInfo@1526249359(106 bytes,
> 3 operations)
> INFO 23:13:43,276 Completed flushing
> /mnt/disk3/cassandra/data/system/LocationInfo-f-2-Data.db (211 bytes)
> INFO 23:13:43,277 Joining: sleeping 30000 ms for pending range setup
> INFO 23:14:13,277 Bootstrapping
> java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
> Caused by: java.lang.IllegalStateException: replication factor (3)
> exceeds number of endpoints (2)
>        at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:212)
>        at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
>        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
>        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:525)
>        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:453)
>        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:403)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:194)
>        at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217)
>        ... 5 more
> Cannot load daemon
> Service exit with a return value of 3
> 
> On the other servers I get:
> 
> ERROR 15:54:24,670 Error in ThreadPoolExecutor
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
>        at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
>        at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
>        at org.apache.cassandra.service.StorageService.handleStateLeaving(StorageService.java:797)
>        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:651)
>        at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:763)
>        at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:753)
>        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:670)
>        at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR 15:54:24,672 Fatal exception in thread Thread[GossipStage:1,5,main]
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
>        at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
>        at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
>        at org.apache.cassandra.service.StorageService.handleStateLeaving(StorageService.java:797)
>        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:651)
>        at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:763)
>        at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:753)
>        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:670)
>        at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 
> Removing the keyspace with RF3 fixed the problem and boostrap went
> well, but why is there a problem with less nodes than servers? I can
> imagine a situation when I would need to remove nodes from cluster and
> get to the point of having less servers than is the maximum RF used.
> I'd then be unable to bootstrap the new servers to the cluster.
> Removing the keyspace is not an option in production environment.
> 
> Thanks,
> Patrik


Mime
View raw message