Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Message-ID: <20591334.98121284049414932.JavaMail.jira@thor>
Date: Thu, 9 Sep 2010 12:23:34 -0400 (EDT)
From: "Gary Dusbabek (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Subject: [jira] Updated: (CASSANDRA-1467) replication factor exceeds number
 of endpoints, when attempting to join a new node (but cluster has enough
 running nodes to fulfill RF)
In-Reply-To: <12779662.23951283556333617.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/CASSANDRA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek updated CASSANDRA-1467:
-------------------------------------

        Fix Version/s: 0.6.6
    Affects Version/s: 0.6.5

> replication factor exceeds number of endpoints, when attempting to join a new node (but cluster has enough running nodes to fulfill RF)
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1467
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1467
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.5, 0.7 beta 1
>         Environment: I'm using 0.7 beta1. Built from: http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.0/apache-cassandra-0.7.0-beta1-src.tar.gz  using dpkg-buildpackage from it.
>            Reporter: Josep M. Blanquer
>            Assignee: Gary Dusbabek
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> What happens here the following:
> * given a healthy running cluster of 2 nodes (it used to be 3 but I manually killed one down)
> * with a Keyspace having a ReplicationFactor of 2
> * (this cluster is operational, loaded with data and working well)
>  as soon as I want to bring up a new 3rd node:
> * the node is detected by the current cluster
> * but as soon as it tries to initiate bootstrap sequence it dies with:  replication factor (2) exceeds number of endpoints (1)
> I believe this is similar to #1343, but not quite the same, since the keyspace and everything is already created, and I'm not attempting any modification. This is purely bringing a new node up.
> One extra tidbit of information, in case it's important...maybe it is not: 
> I have only set 1 seed node configured (this is a test setup). And when the new node starts coming up (before the crash) its nodetool ring only reports the seed one.
> From the new node coming up (before it crashes):
> root@node1-3:~# nodetool -h localhost -p 8080 ring
> Address         Status State   Load            Token                                       
> 10.250.106.111  Up     Normal  116.3 GB        0          < = this the only configured seed node
> From one of the other running nodes (this is the real status of the cluster):
> root@node1-2:~# nodetool -h localhost -p 8080 ring
> Address         Status State   Load            Token                                       
>                                        113416112894748789872342756657008344878    
> 10.250.106.111  Up     Normal  116.3 GB        0                                           
> 10.215.195.81   Up     Normal  116.31 GB       56713727820156410577229101238628035242      
> 10.246.65.221   Up     Joining 7.63 KB         113416112894748789872342756657008344878     
> Here's the full boot trace of the node is trying to join:
> Java HotSpot(TM) Client VM warning: Can't detect initial thread stack location - find_vma failed
> Create RMI registry on port 8081
> Get the platform's MBean server
> Initialize the environment map
> Create an RMI connector server
> Start the RMI connector server on port 8081
>  INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml
> DEBUG 22:32:50,537 Syncing log with a period of 10000
>  INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard
> DEBUG 22:32:50,548 setting auto_bootstrap to true
> DEBUG 22:32:50,702 Starting CFS Statistics
> DEBUG 22:32:50,710 key cache capacity for Statistics is 1
> DEBUG 22:32:50,711 Starting CFS Schema
> DEBUG 22:32:50,712 key cache capacity for Schema is 1
> DEBUG 22:32:50,712 Starting CFS Migrations
> DEBUG 22:32:50,713 key cache capacity for Migrations is 1
> DEBUG 22:32:50,713 Starting CFS LocationInfo
> DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1
> DEBUG 22:32:50,714 Starting CFS HintsColumnFamily
> DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1
>  INFO 22:32:50,736 Couldn't detect any schema definitions in local storage.
>  INFO 22:32:50,737 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> DEBUG 22:32:50,738 opening keyspace system
>  INFO 22:32:50,757 Cassandra version: 
>  INFO 22:32:50,757 Thrift API version: 10.0.0
>  INFO 22:32:50,758 Saved Token not found. Using 113416112894748789872342756657008344878
>  INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING 
>  INFO 22:32:50,763 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553170763.log
> DEBUG 22:32:50,767 Estimating compactions for LocationInfo
> DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily
> DEBUG 22:32:50,768 Estimating compactions for Migrations
> DEBUG 22:32:50,768 Estimating compactions for Schema
> DEBUG 22:32:50,768 Estimating compactions for Statistics
> DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful
>  INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276)
>  INFO 22:32:50,782 Enqueuing flush of Memtable-LocationInfo@18721294(192 bytes, 4 operations)
>  INFO 22:32:50,783 Writing Memtable-LocationInfo@18721294(192 bytes, 4 operations)
>  INFO 22:32:50,897 Completed flushing /mnt/ebs/data/system/LocationInfo-e-1-Data.db
> DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be useful
> DEBUG 22:32:50,898 Discarding 0
> DEBUG 22:32:50,899 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276), column family 0.
> DEBUG 22:32:50,899 Marking replay position 276 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
>  INFO 22:32:50,908 Starting up server gossip
>  INFO 22:32:50,928 Joining: getting load information
>  INFO 22:32:50,928 Sleeping 90000 ms to wait for load information...
> DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111
> DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111
> DEBUG 22:32:51,045 attempting to connect to /10.215.195.81
>  INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster
> DEBUG 22:32:51,051 Resetting pool for /10.215.195.81
> DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed manually (endpoint was unknown)
>  INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster
> DEBUG 22:32:51,053 Resetting pool for /10.250.106.111
> DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0
> DEBUG 22:32:51,053 clearing cached endpoints
> DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111
> DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8
>  INFO 22:32:51,189 switching in a fresh Memtable for Migrations at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
>  INFO 22:32:51,189 Enqueuing flush of Memtable-Migrations@18093512(4938 bytes, 1 operations)
>  INFO 22:32:51,189 Writing Memtable-Migrations@18093512(4938 bytes, 1 operations)
>  INFO 22:32:51,194 switching in a fresh Memtable for Schema at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
>  INFO 22:32:51,194 Enqueuing flush of Memtable-Schema@27402470(1784 bytes, 3 operations)
>  INFO 22:32:51,274 Completed flushing /mnt/ebs/data/system/Migrations-e-1-Data.db
> DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful
> DEBUG 22:32:51,275 Discarding 2
> DEBUG 22:32:51,275 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 2
> .
> DEBUG 22:32:51,275 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
>  INFO 22:32:51,275 Writing Memtable-Schema@27402470(1784 bytes, 3 operations)
>  INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db
> DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful
> DEBUG 22:32:51,389 Discarding 3
> DEBUG 22:32:51,389 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 3
> .
> DEBUG 22:32:51,389 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
> DEBUG 22:32:51,406 Starting CFS MyColumnFamily
> DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000
>  INFO 22:32:51,407 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553171407.log
> DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111
>  INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP
>  INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP
>  INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81
>  INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint /10.215.195.81
>  INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111
>  INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint /10.250.106.111
> DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 used; max is 902627328
> DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed leaving 8520856 used; max is 902627328
> DEBUG 22:32:52,924 Disseminating load info ...
> DEBUG 22:32:53,929 attempting to connect to /10.215.195.81
> DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed leaving 15214608 used; max is 902627328
> DEBUG 22:33:52,931 Disseminating load info ...
> DEBUG 22:34:20,929 ... got load info
>  INFO 22:34:20,929 Joining: getting bootstrap token
> DEBUG 22:34:20,931 token manually specified as 113416112894748789872342756657008344878
>  INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup
>  INFO 22:34:50,935 Bootstrapping
> DEBUG 22:34:50,935 Beginning bootstrap process
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
> Caused by: java.lang.IllegalStateException: replication factor (2) exceeds number of endpoints (1)
>         at org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57)
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195)
>         at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155)
>         at org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73)
>         at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467)
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:408)
>         at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134)
>         at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57)
>         ... 5 more
> Cannot load daemon
> Service exit with a return value of 3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.