Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7BF219D4A for ; Wed, 29 Feb 2012 10:24:24 +0000 (UTC) Received: (qmail 40343 invoked by uid 500); 29 Feb 2012 10:24:24 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 40323 invoked by uid 500); 29 Feb 2012 10:24:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 40315 invoked by uid 99); 29 Feb 2012 10:24:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 10:24:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 10:24:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2CD8B85ACA for ; Wed, 29 Feb 2012 10:23:58 +0000 (UTC) Date: Wed, 29 Feb 2012 10:23:58 +0000 (UTC) From: "Peter Schuller (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <267805235.2295.1330511038185.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <152446057.758.1328501639572.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3857) AIOOB during bootstrap causes insta-join as Normal in ring, serving bad reads MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3857?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 13219071#comment-13219071 ]=20 Peter Schuller commented on CASSANDRA-3857: ------------------------------------------- While I certainly see the idea behind "if we have non-system tables, we're = not coming up from scratch, so we should not bootstrap", I wonder if there = is any expected use-case where the bootstrap flag is false and we don't wan= t to bootstrap? In other words, if SystemTable.isBootstrapped() is false and we have non-sy= stem tables, is that not a reason to bail with an assertion or runtime erro= r? If someone wants to take data off another node and have another node enter = the ring w/o streaming, it seems reasonable to expect them to actually set = autobootstrap to false. Thoughts? =20 > AIOOB during bootstrap causes insta-join as Normal in ring, serving bad r= eads > -------------------------------------------------------------------------= ---- > > Key: CASSANDRA-3857 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3857 > Project: Cassandra > Issue Type: Bug > Reporter: Peter Schuller > Assignee: Peter Schuller > Priority: Critical > > The reason I realized there was something still fishy in CASSANDRA-3832 w= as that I thought I had seen nodes just join instantly into Normal state ea= rlier on trunk. When I tried to re-produce that I got what I'm mentioning i= n CASSANDRA-3832. While investigating *THAT* I now triggered what I believe= to be insta-join again. The node that was bootstrapping died with the belo= w exception during start-up, automatically re-started (due to monit in this= case) and then joined the ring. I'm pretty sure it didn't go into Joining = first, I *just* managed to catch the "watch" I had going on a nodetool | gr= ep on another node. Additionally, looking at the log below I see no evidenc= e of waiting for ring delay which is should as part of boot-strap, nor do I= see "Starting to bootstrap ... " logged. > A quick look at the code makes me confused how this is possible given tha= t the bootstrap flag is set *after* the point int he code where we bailed w= ith the exception. > As for the exception, I've seen something like this when investigating CA= SSANDRA-3831 and CASSANDRA-3417, but I never got to the bottom of it believ= ing it had to do with the lack of synchronization. I look back now and it s= eems my trunk is one that doesn't have the latest patch that I submitted to= CASSANDRA-3417, but I haven't looked at it in detail to determine whether = it fixes it. In any case, the insta-join is a problem. > {code} > INFO [main] 2012-02-06 03:59:34,059 StorageService.java (line 728) JOINI= NG: schema complete, ready to bootstrap > INFO [main] 2012-02-06 03:59:34,059 StorageService.java (line 728) JOINI= NG: getting bootstrap token > ERROR [main] 2012-02-06 03:59:34,062 AbstractCassandraDaemon.java (line 3= 70) Exception encountered during startup > java.lang.ArrayIndexOutOfBoundsException: -2 > at java.util.ArrayList.get(ArrayList.java:324) > at org.apache.cassandra.locator.TokenMetadata.getPredecessor(Toke= nMetadata.java:521) > at org.apache.cassandra.locator.TokenMetadata.getPrimaryRangeFor(= TokenMetadata.java:463) > at org.apache.cassandra.locator.TokenMetadata.pendingRangeChanges= (TokenMetadata.java:111) > at org.apache.cassandra.dht.BootStrapper$1.compare(BootStrapper.j= ava:144) > at org.apache.cassandra.dht.BootStrapper$1.compare(BootStrapper.j= ava:141) > at java.util.Arrays.mergeSort(Arrays.java:1270) > at java.util.Arrays.mergeSort(Arrays.java:1281) > at java.util.Arrays.mergeSort(Arrays.java:1281) > at java.util.Arrays.mergeSort(Arrays.java:1281) > at java.util.Arrays.mergeSort(Arrays.java:1281) > at java.util.Arrays.mergeSort(Arrays.java:1281) > at java.util.Arrays.sort(Arrays.java:1210) > at java.util.Collections.sort(Collections.java:159) > at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootS= trapper.java:140) > at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStr= apper.java:116) > at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootSt= rapper.java:111) > at org.apache.cassandra.service.StorageService.joinTokenRing(Stor= ageService.java:585) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:502) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:411) > at org.apache.cassandra.service.AbstractCassandraDaemon.setup(Abs= tractCassandraDaemon.java:231) > at org.apache.cassandra.service.AbstractCassandraDaemon.activate(= AbstractCassandraDaemon.java:353) > at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaem= on.java:107) > INFO [StorageServiceShutdownHook] 2012-02-06 03:59:34,065 MessagingServi= ce.java (line 541) Waiting for messaging service to quiesce > INFO [ACCEPT-/10.40.33.128] 2012-02-06 03:59:34,065 MessagingService.jav= a (line 697) MessagingService shutting down server thread. > INFO [main] 2012-02-06 03:59:39,760 AbstractCassandraDaemon.java (line 1= 00) Logging initialized > INFO [main] 2012-02-06 03:59:39,763 AbstractCassandraDaemon.java (line 1= 21) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_24 > INFO [main] 2012-02-06 03:59:39,764 AbstractCassandraDaemon.java (line 1= 22) Heap size: 12845645824/12845645824 > INFO [main] 2012-02-06 03:59:39,764 AbstractCassandraDaemon.java (line 1= 23) Classpath: /usr/local/cassandra/current/conf/atla/cassandra.prod.cuckoo= :/usr/local/cassandra/current/vendor/cassandra/build/classes:/usr/local/cas= sandra/current/vendor/cassandra/lib/antlr-3.2.jar:/usr/local/cassandra/curr= ent/vendor/cassandra/lib/apache-cassandra-clientutil-twttr-cassandra-trunk-= upstream-but-buildable-log-calc-pending-6.jar:/usr/local/cassandra/current/= vendor/cassandra/lib/apache-cassandra-thrift-twttr-cassandra-trunk-upstream= -but-buildable-log-calc-pending-6.jar:/usr/local/cassandra/current/vendor/c= assandra/lib/apache-cassandra-twttr-cassandra-trunk-upstream-but-buildable-= log-calc-pending-6.jar:/usr/local/cassandra/current/vendor/cassandra/lib/av= ro-1.4.0-fixes.jar:/usr/local/cassandra/current/vendor/cassandra/lib/avro-1= .4.0-sources-fixes.jar:/usr/local/cassandra/current/vendor/cassandra/lib/co= mmons-cli-1.1.jar:/usr/local/cassandra/current/vendor/cassandra/lib/commons= -codec-1.2.jar:/usr/local/cassandra/current/vendor/cassandra/lib/commons-la= ng-2.4.jar:/usr/local/cassandra/current/vendor/cassandra/lib/compress-lzf-0= .8.4.jar:/usr/local/cassandra/current/vendor/cassandra/lib/concurrentlinked= hashmap-lru-1.2.jar:/usr/local/cassandra/current/vendor/cassandra/lib/guava= -r08.jar:/usr/local/cassandra/current/vendor/cassandra/lib/high-scale-lib-1= .1.2.jar:/usr/local/cassandra/current/vendor/cassandra/lib/jackson-core-asl= -1.9.2.jar:/usr/local/cassandra/current/vendor/cassandra/lib/jackson-mapper= -asl-1.9.2.jar:/usr/local/cassandra/current/vendor/cassandra/lib/jamm-0.2.5= .jar:/usr/local/cassandra/current/vendor/cassandra/lib/jline-0.9.94.jar:/us= r/local/cassandra/current/vendor/cassandra/lib/json-simple-1.1.jar:/usr/loc= al/cassandra/current/vendor/cassandra/lib/libthrift-0.7.0.jar:/usr/local/ca= ssandra/current/vendor/cassandra/lib/log4j-1.2.16.jar:/usr/local/cassandra/= current/vendor/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/local/cassan= dra/current/vendor/cassandra/lib/slf4j-api-1.6.1.jar:/usr/local/cassandra/c= urrent/vendor/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/local/cassandra/cu= rrent/vendor/cassandra/lib/snakeyaml-1.6.jar:/usr/local/cassandra/current/v= endor/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/local/cassandra/current/ve= ndor/cassandra/lib/snaptree-0.1-SNAPSHOT.jar:/usr/local/cassandra/current/v= endor/cassandra/build/lib/jars/*.jar:/usr/local/cassandra/current/vendor/ca= ssandra/build/lib/twitter-jars/*.jar:/usr/local/cassandra/current/vendor/ca= ssandra/lib/twitter-jars/*.jar > INFO [main] 2012-02-06 03:59:39,765 CLibrary.java (line 66) JNA not foun= d. Native methods will be disabled. > INFO [main] 2012-02-06 03:59:39,772 DatabaseDescriptor.java (line 127) L= oading settings from file:/usr/local/cassandra/releases/20120206035358/conf= /atla/cassandra.prod.cuckoo-upstream-trunk/cassandra.yaml > INFO [main] 2012-02-06 03:59:39,874 DatabaseDescriptor.java (line 181) D= iskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap > INFO [main] 2012-02-06 03:59:39,884 DatabaseDescriptor.java (line 246) G= lobal memtable threshold is enabled at 4083MB > INFO [main] 2012-02-06 03:59:39,992 CacheService.java (line 100) Initial= izing key cache with capacity of 2 MBs. > INFO [main] 2012-02-06 03:59:40,003 CacheService.java (line 111) Schedul= ing key cache save to each 14400 seconds (going to save all keys). > INFO [main] 2012-02-06 03:59:40,004 CacheService.java (line 125) Initial= izing row cache with capacity of 0 MBs and provider org.apache.cassandra.ca= che.ConcurrentLinkedHashCacheProvider > INFO [main] 2012-02-06 03:59:40,006 CacheService.java (line 137) Schedul= ing row cache save to each 0 seconds (going to save all keys). > INFO [SSTableBatchOpen:1] 2012-02-06 03:59:40,070 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_columnfamilies/system-= schema_columnfamilies-hc-30 (1399 bytes) > INFO [SSTableBatchOpen:2] 2012-02-06 03:59:40,071 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_columnfamilies/system-= schema_columnfamilies-hc-29 (1399 bytes) > INFO [SSTableBatchOpen:3] 2012-02-06 03:59:40,071 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_columnfamilies/system-= schema_columnfamilies-hc-31 (1399 bytes) > INFO [SSTableBatchOpen:1] 2012-02-06 03:59:40,106 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_keyspaces/system-schem= a_keyspaces-hc-30 (255 bytes) > INFO [SSTableBatchOpen:3] 2012-02-06 03:59:40,107 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_keyspaces/system-schem= a_keyspaces-hc-29 (255 bytes) > INFO [SSTableBatchOpen:2] 2012-02-06 03:59:40,107 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/schema_keyspaces/system-schem= a_keyspaces-hc-31 (255 bytes) > INFO [SSTableBatchOpen:1] 2012-02-06 03:59:40,116 SSTableReader.java (li= ne 156) Opening /data/diskarray/tables/system/LocationInfo/system-LocationI= nfo-hc-245 (6775 bytes) > INFO [main] 2012-02-06 03:59:40,391 AbstractCassandraDaemon.java (line 2= 08) completed pre-loading (4 keys) key cache. > INFO [main] 2012-02-06 03:59:40,504 CommitLog.java (line 142) Replaying = /data/disk1/commitlog/CommitLog-650613166587178.log, /data/disk1/commitlog/= CommitLog-650613060900385.log > INFO [main] 2012-02-06 03:59:40,508 CommitLog.java (line 185) Replaying = /data/disk1/commitlog/CommitLog-650613166587178.log > INFO [main] 2012-02-06 03:59:40,509 CommitLog.java (line 339) Finished r= eading /data/disk1/commitlog/CommitLog-650613166587178.log > INFO [main] 2012-02-06 03:59:40,509 CommitLog.java (line 185) Replaying = /data/disk1/commitlog/CommitLog-650613060900385.log > INFO [main] 2012-02-06 03:59:40,584 CommitLog.java (line 339) Finished r= eading /data/disk1/commitlog/CommitLog-650613060900385.log > INFO [main] 2012-02-06 03:59:40,588 ColumnFamilyStore.java (line 590) En= queuing flush of Memtable-Versions@275980095(141/176 serialized/live bytes,= 3 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,600 Memtable.java (line 252) Wr= iting Memtable-Versions@275980095(141/176 serialized/live bytes, 3 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,618 Memtable.java (line 293) Co= mpleted flushing /data/diskarray/tables/system/Versions/system-Versions-hc-= 1-Data.db (305 bytes) > INFO [main] 2012-02-06 03:59:40,625 CommitLog.java (line 144) Log replay= complete, 3 replayed mutations > INFO [main] 2012-02-06 03:59:40,637 StorageService.java (line 416) Cassa= ndra version: twttr-cassandra-trunk-upstream-but-buildable-log-calc-pending= -6 > INFO [main] 2012-02-06 03:59:40,638 StorageService.java (line 417) Thrif= t API version: 19.28.0 > INFO [main] 2012-02-06 03:59:40,639 StorageService.java (line 418) CQL s= upported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0) > INFO [main] 2012-02-06 03:59:40,639 StorageService.java (line 431) Loadi= ng persisted ring state > INFO [main] 2012-02-06 03:59:40,670 StorageService.java (line 512) Start= ing up server gossip > INFO [main] 2012-02-06 03:59:40,676 ColumnFamilyStore.java (line 590) En= queuing flush of Memtable-LocationInfo@1983702821(29/36 serialized/live byt= es, 1 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,676 Memtable.java (line 252) Wr= iting Memtable-LocationInfo@1983702821(29/36 serialized/live bytes, 1 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,683 Memtable.java (line 293) Co= mpleted flushing /data/diskarray/tables/system/LocationInfo/system-Location= Info-hc-246-Data.db (80 bytes) > INFO [main] 2012-02-06 03:59:40,689 MessagingService.java (line 286) Sta= rting Messaging Service on port 7000 > WARN [main] 2012-02-06 03:59:40,697 StorageService.java (line 619) Gener= ated random token 100588447224685279983438741296967523984. Random tokens wi= ll result in an unbalanced ring; see http://wiki.apache.org/cassandra/Opera= tions > INFO [main] 2012-02-06 03:59:40,697 ColumnFamilyStore.java (line 590) En= queuing flush of Memtable-LocationInfo@44130647(53/66 serialized/live bytes= , 2 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,698 Memtable.java (line 252) Wr= iting Memtable-LocationInfo@44130647(53/66 serialized/live bytes, 2 ops) > INFO [FlushWriter:1] 2012-02-06 03:59:40,703 Memtable.java (line 293) Co= mpleted flushing /data/diskarray/tables/system/LocationInfo/system-Location= Info-hc-247-Data.db (163 bytes) > INFO [main] 2012-02-06 03:59:40,705 StorageService.java (line 1049) Node= /10.40.33.128 state jump to normal > INFO [main] 2012-02-06 03:59:40,711 StorageService.java (line 1275) calc= ulate pending ranges called, took 6 ms > INFO [main] 2012-02-06 03:59:40,711 StorageService.java (line 641) Boots= trap/Replace/Move completed! Now serving reads. > INFO [main] 2012-02-06 03:59:40,712 Mx4jTool.java (line 72) Will not loa= d MX4J, mx4j-tools.jar is not in the classpath > INFO [main] 2012-02-06 03:59:40,736 CassandraDaemon.java (line 125) Bind= ing thrift service to localhost/127.0.0.1:9160 > INFO [main] 2012-02-06 03:59:40,738 CassandraDaemon.java (line 134) Usin= g TFastFramedTransport with a max frame size of 15728640 bytes. > INFO [main] 2012-02-06 03:59:40,740 CassandraDaemon.java (line 161) Usin= g synchronous/threadpool thrift server on localhost/127.0.0.1 : 9160 > INFO [Thread-2] 2012-02-06 03:59:40,741 CassandraDaemon.java (line 213) = Listening for thrift clients... > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira