cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josep M. Blanquer (JIRA)" <j...@apache.org>
Subject [jira] Created: (CASSANDRA-1467) replication factor exceeds number of endpoints, when attempting to join a new node (but cluster has enough running nodes to fulfill RF)
Date Fri, 03 Sep 2010 23:25:33 GMT
replication factor exceeds number of endpoints, when attempting to join a new node (but cluster
has enough running nodes to fulfill RF)
---------------------------------------------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-1467
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1467
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7 beta 1
            Reporter: Josep M. Blanquer


What happens here the following:
* given a healthy running cluster of 2 nodes (it used to be 3 but I manually killed one down)
* with a Keyspace having a ReplicationFactor of 2
* (this cluster is operational, loaded with data and working well)

 as soon as I want to bring up a new 3rd node:
* the node is detected by the current cluster
* but as soon as it tries to initiate bootstrap sequence it dies with:  replication factor
(2) exceeds number of endpoints (1)

I believe this is similar to #1343, but not quite the same, since the keyspace and everything
is already created, and I'm not attempting any modification. This is purely bringing a new
node up.

One extra tidbit of information, in case it's important...maybe it is not: 
I have only set 1 seed node configured (this is a test setup). And when the new node starts
coming up (before the crash) its nodetool ring only reports the seed one.

>From the new node coming up (before it crashes):
root@node1-3:~# nodetool -h localhost -p 8080 ring
Address         Status State   Load            Token                                     
 
10.250.106.111  Up     Normal  116.3 GB        0          < = this the only configured
seed node

>From one of the other running nodes (this is the real status of the cluster):
root@node1-2:~# nodetool -h localhost -p 8080 ring
Address         Status State   Load            Token                                     
 
                                       113416112894748789872342756657008344878    
10.250.106.111  Up     Normal  116.3 GB        0                                         
 
10.215.195.81   Up     Normal  116.31 GB       56713727820156410577229101238628035242    
 
10.246.65.221   Up     Joining 7.63 KB         113416112894748789872342756657008344878   
 


Here's the full boot trace of the node is trying to join:
Java HotSpot(TM) Client VM warning: Can't detect initial thread stack location - find_vma
failed
Create RMI registry on port 8081
Get the platform's MBean server
Initialize the environment map
Create an RMI connector server
Start the RMI connector server on port 8081
 INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml
DEBUG 22:32:50,537 Syncing log with a period of 10000
 INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard
DEBUG 22:32:50,548 setting auto_bootstrap to true
DEBUG 22:32:50,702 Starting CFS Statistics
DEBUG 22:32:50,710 key cache capacity for Statistics is 1
DEBUG 22:32:50,711 Starting CFS Schema
DEBUG 22:32:50,712 key cache capacity for Schema is 1
DEBUG 22:32:50,712 Starting CFS Migrations
DEBUG 22:32:50,713 key cache capacity for Migrations is 1
DEBUG 22:32:50,713 Starting CFS LocationInfo
DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1
DEBUG 22:32:50,714 Starting CFS HintsColumnFamily
DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1
 INFO 22:32:50,736 Couldn't detect any schema definitions in local storage.
 INFO 22:32:50,737 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
DEBUG 22:32:50,738 opening keyspace system
 INFO 22:32:50,757 Cassandra version: 
 INFO 22:32:50,757 Thrift API version: 10.0.0
 INFO 22:32:50,758 Saved Token not found. Using 113416112894748789872342756657008344878
 INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING 
 INFO 22:32:50,763 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553170763.log
DEBUG 22:32:50,767 Estimating compactions for LocationInfo
DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily
DEBUG 22:32:50,768 Estimating compactions for Migrations
DEBUG 22:32:50,768 Estimating compactions for Schema
DEBUG 22:32:50,768 Estimating compactions for Statistics
DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be useful
DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful
 INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=276)
 INFO 22:32:50,782 Enqueuing flush of Memtable-LocationInfo@18721294(192 bytes, 4 operations)
 INFO 22:32:50,783 Writing Memtable-LocationInfo@18721294(192 bytes, 4 operations)
 INFO 22:32:50,897 Completed flushing /mnt/ebs/data/system/LocationInfo-e-1-Data.db
DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be useful
DEBUG 22:32:50,898 Discarding 0
DEBUG 22:32:50,899 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=276), column family 0.
DEBUG 22:32:50,899 Marking replay position 276 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
 INFO 22:32:50,908 Starting up server gossip
 INFO 22:32:50,928 Joining: getting load information
 INFO 22:32:50,928 Sleeping 90000 ms to wait for load information...
DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111
DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111
DEBUG 22:32:51,045 attempting to connect to /10.215.195.81
 INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster
DEBUG 22:32:51,051 Resetting pool for /10.215.195.81
DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed manually (endpoint
was unknown)
 INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster
DEBUG 22:32:51,053 Resetting pool for /10.250.106.111
DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0
DEBUG 22:32:51,053 clearing cached endpoints
DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111
DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8
 INFO 22:32:51,189 switching in a fresh Memtable for Migrations at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=7417)
 INFO 22:32:51,189 Enqueuing flush of Memtable-Migrations@18093512(4938 bytes, 1 operations)
 INFO 22:32:51,189 Writing Memtable-Migrations@18093512(4938 bytes, 1 operations)
 INFO 22:32:51,194 switching in a fresh Memtable for Schema at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=7417)
 INFO 22:32:51,194 Enqueuing flush of Memtable-Schema@27402470(1784 bytes, 3 operations)
 INFO 22:32:51,274 Completed flushing /mnt/ebs/data/system/Migrations-e-1-Data.db
DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful
DEBUG 22:32:51,275 Discarding 2
DEBUG 22:32:51,275 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=7417), column family 2
.
DEBUG 22:32:51,275 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
 INFO 22:32:51,275 Writing Memtable-Schema@27402470(1784 bytes, 3 operations)
 INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db
DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful
DEBUG 22:32:51,389 Discarding 3
DEBUG 22:32:51,389 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log',
position=7417), column family 3
.
DEBUG 22:32:51,389 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
DEBUG 22:32:51,406 Starting CFS MyColumnFamily
DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000
 INFO 22:32:51,407 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553171407.log
DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111
 INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP
 INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP
 INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81
 INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint /10.215.195.81
 INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111
 INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint /10.250.106.111
DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 used; max is
902627328
DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed leaving 8520856 used;
max is 902627328
DEBUG 22:32:52,924 Disseminating load info ...
DEBUG 22:32:53,929 attempting to connect to /10.215.195.81
DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed leaving 15214608
used; max is 902627328
DEBUG 22:33:52,931 Disseminating load info ...
DEBUG 22:34:20,929 ... got load info
 INFO 22:34:20,929 Joining: getting bootstrap token
DEBUG 22:34:20,931 token manually specified as 113416112894748789872342756657008344878
 INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup
 INFO 22:34:50,935 Bootstrapping
DEBUG 22:34:50,935 Beginning bootstrap process
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
Caused by: java.lang.IllegalStateException: replication factor (2) exceeds number of endpoints
(1)
        at org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57)
        at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195)
        at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155)
        at org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73)
        at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:408)
        at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134)
        at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57)
        ... 5 more
Cannot load daemon
Service exit with a return value of 3


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message