Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 74914 invoked from network); 9 Sep 2010 16:24:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Sep 2010 16:24:00 -0000 Received: (qmail 75270 invoked by uid 500); 9 Sep 2010 16:24:00 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 75252 invoked by uid 500); 9 Sep 2010 16:23:59 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 75244 invoked by uid 99); 9 Sep 2010 16:23:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Sep 2010 16:23:59 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Sep 2010 16:23:56 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o89GNYOK021852 for ; Thu, 9 Sep 2010 16:23:34 GMT Message-ID: <20591334.98121284049414932.JavaMail.jira@thor> Date: Thu, 9 Sep 2010 12:23:34 -0400 (EDT) From: "Gary Dusbabek (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Updated: (CASSANDRA-1467) replication factor exceeds number of endpoints, when attempting to join a new node (but cluster has enough running nodes to fulfill RF) In-Reply-To: <12779662.23951283556333617.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Dusbabek updated CASSANDRA-1467: ------------------------------------- Fix Version/s: 0.6.6 Affects Version/s: 0.6.5 > replication factor exceeds number of endpoints, when attempting to join a new node (but cluster has enough running nodes to fulfill RF) > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-1467 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1467 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6.5, 0.7 beta 1 > Environment: I'm using 0.7 beta1. Built from: http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.0/apache-cassandra-0.7.0-beta1-src.tar.gz using dpkg-buildpackage from it. > Reporter: Josep M. Blanquer > Assignee: Gary Dusbabek > Fix For: 0.6.6, 0.7 beta 2 > > > What happens here the following: > * given a healthy running cluster of 2 nodes (it used to be 3 but I manually killed one down) > * with a Keyspace having a ReplicationFactor of 2 > * (this cluster is operational, loaded with data and working well) > as soon as I want to bring up a new 3rd node: > * the node is detected by the current cluster > * but as soon as it tries to initiate bootstrap sequence it dies with: replication factor (2) exceeds number of endpoints (1) > I believe this is similar to #1343, but not quite the same, since the keyspace and everything is already created, and I'm not attempting any modification. This is purely bringing a new node up. > One extra tidbit of information, in case it's important...maybe it is not: > I have only set 1 seed node configured (this is a test setup). And when the new node starts coming up (before the crash) its nodetool ring only reports the seed one. > From the new node coming up (before it crashes): > root@node1-3:~# nodetool -h localhost -p 8080 ring > Address Status State Load Token > 10.250.106.111 Up Normal 116.3 GB 0 < = this the only configured seed node > From one of the other running nodes (this is the real status of the cluster): > root@node1-2:~# nodetool -h localhost -p 8080 ring > Address Status State Load Token > 113416112894748789872342756657008344878 > 10.250.106.111 Up Normal 116.3 GB 0 > 10.215.195.81 Up Normal 116.31 GB 56713727820156410577229101238628035242 > 10.246.65.221 Up Joining 7.63 KB 113416112894748789872342756657008344878 > Here's the full boot trace of the node is trying to join: > Java HotSpot(TM) Client VM warning: Can't detect initial thread stack location - find_vma failed > Create RMI registry on port 8081 > Get the platform's MBean server > Initialize the environment map > Create an RMI connector server > Start the RMI connector server on port 8081 > INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml > DEBUG 22:32:50,537 Syncing log with a period of 10000 > INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard > DEBUG 22:32:50,548 setting auto_bootstrap to true > DEBUG 22:32:50,702 Starting CFS Statistics > DEBUG 22:32:50,710 key cache capacity for Statistics is 1 > DEBUG 22:32:50,711 Starting CFS Schema > DEBUG 22:32:50,712 key cache capacity for Schema is 1 > DEBUG 22:32:50,712 Starting CFS Migrations > DEBUG 22:32:50,713 key cache capacity for Migrations is 1 > DEBUG 22:32:50,713 Starting CFS LocationInfo > DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1 > DEBUG 22:32:50,714 Starting CFS HintsColumnFamily > DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1 > INFO 22:32:50,736 Couldn't detect any schema definitions in local storage. > INFO 22:32:50,737 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > DEBUG 22:32:50,738 opening keyspace system > INFO 22:32:50,757 Cassandra version: > INFO 22:32:50,757 Thrift API version: 10.0.0 > INFO 22:32:50,758 Saved Token not found. Using 113416112894748789872342756657008344878 > INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING > INFO 22:32:50,763 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553170763.log > DEBUG 22:32:50,767 Estimating compactions for LocationInfo > DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily > DEBUG 22:32:50,768 Estimating compactions for Migrations > DEBUG 22:32:50,768 Estimating compactions for Schema > DEBUG 22:32:50,768 Estimating compactions for Statistics > DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be useful > DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would be useful > DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful > DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful > DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful > INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276) > INFO 22:32:50,782 Enqueuing flush of Memtable-LocationInfo@18721294(192 bytes, 4 operations) > INFO 22:32:50,783 Writing Memtable-LocationInfo@18721294(192 bytes, 4 operations) > INFO 22:32:50,897 Completed flushing /mnt/ebs/data/system/LocationInfo-e-1-Data.db > DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be useful > DEBUG 22:32:50,898 Discarding 0 > DEBUG 22:32:50,899 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276), column family 0. > DEBUG 22:32:50,899 Marking replay position 276 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log) > INFO 22:32:50,908 Starting up server gossip > INFO 22:32:50,928 Joining: getting load information > INFO 22:32:50,928 Sleeping 90000 ms to wait for load information... > DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111 > DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111 > DEBUG 22:32:51,045 attempting to connect to /10.215.195.81 > INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster > DEBUG 22:32:51,051 Resetting pool for /10.215.195.81 > DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed manually (endpoint was unknown) > INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster > DEBUG 22:32:51,053 Resetting pool for /10.250.106.111 > DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0 > DEBUG 22:32:51,053 clearing cached endpoints > DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111 > DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8 > INFO 22:32:51,189 switching in a fresh Memtable for Migrations at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417) > INFO 22:32:51,189 Enqueuing flush of Memtable-Migrations@18093512(4938 bytes, 1 operations) > INFO 22:32:51,189 Writing Memtable-Migrations@18093512(4938 bytes, 1 operations) > INFO 22:32:51,194 switching in a fresh Memtable for Schema at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417) > INFO 22:32:51,194 Enqueuing flush of Memtable-Schema@27402470(1784 bytes, 3 operations) > INFO 22:32:51,274 Completed flushing /mnt/ebs/data/system/Migrations-e-1-Data.db > DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful > DEBUG 22:32:51,275 Discarding 2 > DEBUG 22:32:51,275 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 2 > . > DEBUG 22:32:51,275 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log) > INFO 22:32:51,275 Writing Memtable-Schema@27402470(1784 bytes, 3 operations) > INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db > DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful > DEBUG 22:32:51,389 Discarding 3 > DEBUG 22:32:51,389 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 3 > . > DEBUG 22:32:51,389 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log) > DEBUG 22:32:51,406 Starting CFS MyColumnFamily > DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000 > INFO 22:32:51,407 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553171407.log > DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111 > INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP > INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP > INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81 > INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint /10.215.195.81 > INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111 > INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint /10.250.106.111 > DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 used; max is 902627328 > DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed leaving 8520856 used; max is 902627328 > DEBUG 22:32:52,924 Disseminating load info ... > DEBUG 22:32:53,929 attempting to connect to /10.215.195.81 > DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed leaving 15214608 used; max is 902627328 > DEBUG 22:33:52,931 Disseminating load info ... > DEBUG 22:34:20,929 ... got load info > INFO 22:34:20,929 Joining: getting bootstrap token > DEBUG 22:34:20,931 token manually specified as 113416112894748789872342756657008344878 > INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup > INFO 22:34:50,935 Bootstrapping > DEBUG 22:34:50,935 Beginning bootstrap process > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) > Caused by: java.lang.IllegalStateException: replication factor (2) exceeds number of endpoints (1) > at org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57) > at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195) > at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155) > at org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73) > at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467) > at org.apache.cassandra.service.StorageService.initServer(StorageService.java:408) > at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134) > at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57) > ... 5 more > Cannot load daemon > Service exit with a return value of 3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.