Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 64705 invoked from network); 16 Sep 2010 22:00:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Sep 2010 22:00:18 -0000 Received: (qmail 43790 invoked by uid 500); 16 Sep 2010 22:00:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43747 invoked by uid 500); 16 Sep 2010 22:00:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43739 invoked by uid 99); 16 Sep 2010 22:00:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Sep 2010 22:00:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gurpreet.singh@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Sep 2010 22:00:11 +0000 Received: by iwn3 with SMTP id 3so1604934iwn.31 for ; Thu, 16 Sep 2010 14:59:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=3JjjtQKZTk5WSpHZjciBoDSR72vNhzJqXTuotxrkKV8=; b=Z9d1VNaEOvEfTfo4ja0+G06E+80rE0MY2zNM3T9XkfDmyUmP+/aT9f+xRJYs0h4Lw4 4J3HU0/xCeNTq3WLo4BCS2LcJnKvqiwwHhO6f/Gam3SGNgSFfbFy/XX2yvLbRkWR5Zx8 6zQjf2JOlVPUfjc2pMNvHE30GnnbeR0pbnpV4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pcT4gRa1Q/oB/A/muUMD7H4ypdT+gcg+vDt8Iya4PevcPU3VC3/xL6POP+lBHhC8Gy LV9PrRs35zC+YKYgYA4Nd+6ca97/OzZk15qrXUnGAw54Exg2wuGOPuiF9XOO9Ftpd0jo DEYcCYCfnLycL0UwG+ODvbqKm/xrJ+ifB003U= MIME-Version: 1.0 Received: by 10.231.167.196 with SMTP id r4mr3988700iby.29.1284674390851; Thu, 16 Sep 2010 14:59:50 -0700 (PDT) Received: by 10.231.199.198 with HTTP; Thu, 16 Sep 2010 14:59:50 -0700 (PDT) In-Reply-To: References: Date: Thu, 16 Sep 2010 14:59:50 -0700 Message-ID: Subject: Re: Bootstrapping stays stuck From: Gurpreet Singh To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=005045016763b80ead049067906b --005045016763b80ead049067906b Content-Type: text/plain; charset=ISO-8859-1 Thanks to driftx from cassandra IRC channel for helping out. This was resolved by increasing the rpc timeout for the bootstrap process. On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh wrote: > This problem still stays unresolved despite numerous restarts to the > cluster. I cant seem to find a way out of this one, and I am not really > looking for a workaround, kinda need this to work if i need to go to > production. > > Turned on the ALL logging in log4j, and now I see the following exception > (EOFException) on the destination. After receiving each file, it seems to be > throwing this exception. The transfer is successful except for this > exception. The source successful declares the transfer complete. But the > destination does not move out of the bootstrapping mode, and just sits > there. > > DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line > 65) Receiving stream: finished reading chunk, awaiting more > DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line > 87) Removing stream context > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369 > DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java > (line 73) Sending a streaming finished message with > org.apache.cassandra.streaming.CompletedFileStatus@54828e7 to IP1 > TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line > 82) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(Unknown Source) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) > DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line > 51) Receiving stream > DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line > 54) Creating file for > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db > DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line > 65) Receiving stream: finished reading chunk, awaiting more > DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line > 87) Removing stream context > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045 > DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java > (line 73) Sending a streaming finished message with > org.apache.cassandra.streaming.CompletedFileStatus@7b41a32f to IP1 > TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line > 82) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(Unknown Source) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) > > /G > > On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh > wrote: > >> Hi Vineet, >> I have tracked the nodetool streams to completion each time. Below are the >> logs on the source and destination node. There are 3 sstables being >> transferred, and the transfer seems to be successful. However, after the >> streams finish, the source prints out messages about the dropped messages, >> which may point to the problem. ideas? I checked port 7000 is open for >> communication. 9160 is not up on the node being bootstrapped, but that comes >> up after the node is bootstrapped, is that right? >> >> Thanks a ton, >> /G >> >> *Logs on the source node (IP2):* >> * >> * >> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79) >> Flushing memtables for userdata... >> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95) >> Performing anticompaction ... >> INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')] >> INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129) >> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is >> 4415946752 >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java >> (line 396) AntiCompacted to >> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db. >> 49074138589/36770836242 bytes for 5990912 keys. Time: 1438607ms. >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')] >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java >> (line 396) AntiCompacted to >> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db. >> 28185244/21126422 bytes for 47722 keys. Time: 2310ms. >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')] >> INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java >> (line 396) AntiCompacted to >> /data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db. >> 2927724285/2195768325 bytes for 3976118 keys. Time: 179766ms. >> INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127) >> Stream context metadata >> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369, >> 3 >> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045, >> 3 >> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125, >> 3 >> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325 >> INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132) >> Sending a stream initiate message to IP1... >> INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137) >> Waiting for transfer to IP1 to complete >> *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592 >> MessagingService.java (line 501) Dropped 9 messages in the last 1000ms* >> * *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line >> 141) Done with transfer to IP1 >> INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670 >> SSTableDeletingReference.java (line 104) Deleted >> /data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db >> *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602 >> MessagingService.java (line 501) Dropped 1 messages in the last 1000ms* >> * *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133 >> SSTableDeletingReference.java (line 104) Deleted >> /data/mysql/cassandrastorage/system/LocationInfo-19-Data.db >> INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134 >> SSTableDeletingReference.java (line 104) Deleted >> /data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db >> INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134 >> SSTableDeletingReference.java (line 104) Deleted >> /data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db >> >> >> >> *Logs on new node being bootstrapped (IP1):* >> >> INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token >> will be 149298847080838649048722691811739653739 to assume load from IP2 >> INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388) >> Joining: sleeping 30000 ms for pending range setup >> INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388) >> Bootstrapping >> INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120) >> Sampling index for >> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db >> INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java >> (line 64) Streaming added >> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db >> INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120) >> Sampling index for >> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db >> INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java >> (line 64) Streaming added >> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db >> INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180) >> InetAddress IP2 is now dead. >> INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120) >> Sampling index for >> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db >> INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java >> (line 64) Streaming added >> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db >> INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605 >> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2 >> INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578) >> InetAddress IP2 is now UP >> INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606 >> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to >> endpoint IP2 >> INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line >> 357) LocationInfo has reached its threshold; switching in a fresh Memtable >> at >> CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log', >> position=1353) >> INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line >> 609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6 >> operations) >> INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line >> 148) Writing Memtable-LocationInfo@2057079871(210 bytes, 6 operations) >> INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line >> 162) Completed flushing >> /data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db >> >> >> On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel wrote: >> >>> Hi Gurpreet >>> >>> What is the output of nodetool -h streams -->( to see >>> what is going on between the nodes) . If you dont see anything happening >>> try switching off firewall or iptables. >>> >>> >>> Regards >>> Vineet Daniel >>> Cell : +918106217121 >>> Websites : >>> Blog | Linkedin >>> | Twitter >>> >>> >>> >>> >>> >>> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh < >>> gurpreet.singh@gmail.com> wrote: >>> >>>> I tried this again, it happenned yet again. >>>> This time while the transfer messages seemed tobe in order, i also >>>> noticed that the source logs talk about having 9 dropped messages in the >>>> last 1000 ms. The only activity on the whole cluster is this bootstrapping, >>>> there is no read/write traffic going on. >>>> >>>> /G >>>> >>>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh < >>>> gurpreet.singh@gmail.com> wrote: >>>> >>>>> I am using cassandra 0.6.5. >>>>> >>>>> >>>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh < >>>>> gurpreet.singh@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap >>>>>> 2 more machines, one at a time. >>>>>> For both these machines, the bootstrapping stays stuck after the >>>>>> streaming is done. >>>>>> >>>>>> When the nodes come up for bootstrapping, I see all the relevant >>>>>> messages about getting a new token, assuming load from a particular host. I >>>>>> see a couple of nodes anticompacting data to send, and at a later point the >>>>>> node that is bootstrapping prints the right streaming mesgs. However, once >>>>>> the streaming is over, the node just doesnt do anything. Previously while >>>>>> bootstrapping, I have seen that after the streaming is done, the node >>>>>> restarts and becomes part of the ring by itself. I dont see this behaviour >>>>>> with both the nodes I tried today. >>>>>> I even restarted all the nodes in the cluster, and tried bootstrapping >>>>>> one of the nodes again, but it again was stuck after streaming. It seems to >>>>>> have copied the relevant load as well. >>>>>> Any ideas as to what could be going on here? >>>>>> >>>>>> /G >>>>>> >>>>> >>>>> >>>> >>> >> > --005045016763b80ead049067906b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks to driftx from cassandra IRC channel for helping out.=A0
This wa= s resolved by increasing the rpc timeout for the bootstrap process.

=
On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh= <gurpreet= .singh@gmail.com> wrote:
This problem still stays unresolved despite= numerous restarts to the cluster. I cant seem to find a way out of this on= e, and I am not really looking for a workaround, kinda need this to work if= i need to go to production.=A0

Turned on the ALL logging in log4j, and now I see the follow= ing exception (EOFException) on the destination. After receiving each file,= it seems to be throwing this exception. The transfer is successful except = for this exception. The source successful declares the transfer complete. B= ut the destination does not move out of the bootstrapping mode, and just si= ts there.

DEBUG [Thread-15] 2010-09-15 10:56:59,767 Incoming= StreamReader.java (line 65) Receiving stream: finished reading chunk, await= ing more
DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStream= Reader.java (line 87) Removing stream context /data/cassandra/datadir/cassa= ndradb/userdata/user_list_items-tmp-1-Index.db:522051369
DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java= (line 73) Sending a streaming finished message with org.apache.cassandra.s= treaming.CompletedFileStatus@54828e7 to IP1
TRACE [Thread-15] 201= 0-09-15 10:56:59,769 IncomingTcpConnection.java (line 82) eof reading from = socket; closing
java.io.EOFException
=A0=A0 =A0 =A0 =A0at java.io.DataInputS= tream.readInt(Unknown Source)
=A0=A0 =A0 =A0 =A0at org.apache.cas= sandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (lin= e 51) Receiving stream
DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (l= ine 54) Creating file for /data/cassandra/datadir/cassandradb/userdata/user= _list_items-tmp-1-Filter.db
DEBUG [Thread-16] 2010-09-15 10:56:59= ,876 IncomingStreamReader.java (line 65) Receiving stream: finished reading= chunk, awaiting more
DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (l= ine 87) Removing stream context /data/cassandra/datadir/cassandradb/userdat= a/user_list_items-tmp-1-Filter.db:7489045
DEBUG [Thread-16] 2010-= 09-15 10:56:59,876 StreamCompletionHandler.java (line 73) Sending a streami= ng finished message with org.apache.cassandra.streaming.CompletedFileStatus= @7b41a32f to IP1
TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (= line 82) eof reading from socket; closing
java.io.EOFException
=A0=A0 =A0 =A0 =A0at java.io.DataInputStream.readInt(Unknown Source= )
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.net.IncomingTcpConnection.ru= n(IncomingTcpConnection.java:59)

/G

On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh <gurpreet.singh@gm= ail.com> wrote:
Hi Vineet,
I have tracked the nodetool s= treams to completion each time. Below are the logs on the source and destin= ation node. There are 3 sstables being transferred, and the transfer seems = to be successful. However, after the streams finish, the source prints out = messages about the dropped messages, which may point to the problem. ideas?= I checked port 7000 is open for communication. 9160 is not up on the node = being bootstrapped, but that comes up after the node is bootstrapped, is th= at right?

Thanks a ton,
/G

Logs on the source node (IP2):

INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line = 79) Flushing memtables for userdata...
=A0INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line = 95) Performing anticompaction ...
=A0INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.= java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path= =3D'/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-D= ata.db')]
=A0INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line= 129) GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max= is 4415946752
=A0INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,50= 8 CompactionManager.java (line 396) AntiCompacted to /data/cassandra/datadi= r/cassandradb/userdata/stream/user_list_items-5825-Data.db. =A049074138589/= 36770836242 bytes for 5990912 keys. =A0Time: 1438607ms.
=A0INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.= java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path= =3D'/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]=
=A0INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.= java (line 396) AntiCompacted to /data/mysql/cassandrastorage/userdata/stre= am/user-24-Data.db. =A028185244/21126422 bytes for 47722 keys. =A0Time: 231= 0ms.
=A0INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.= java (line 339) AntiCompacting [org.apache.cassandra.io.SSTableReader(path= =3D'/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db= ')]
=A0INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.= java (line 396) AntiCompacted to /data/mysql/cassandrastorage/userdata/stre= am/user_lists-504-Data.db. =A02927724285/2195768325 bytes for 3976118 keys.= =A0Time: 179766ms.
=A0INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line = 127) Stream context metadata /data/cassandra/datadir/cassandradb/userdata/s= tream/user_list_items-5825-Index.db:522051369, =A03 sstables./data/cassandr= a/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:748904= 5, =A03 sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_l= ist_items-5825-Data.db:36770836242, =A03 sstables./data/mysql/cassandrastor= age/userdata/stream/user-24-Index.db:3373143, =A03 sstables./data/mysql/cas= sandrastorage/userdata/stream/user-24-Filter.db:59965, =A03 sstables./data/= mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422, =A03 sstab= les./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:28= 2956452, =A03 sstables./data/mysql/cassandrastorage/userdata/stream/user_li= sts-504-Filter.db:4970125, =A03 sstables./data/mysql/cassandrastorage/userd= ata/stream/user_lists-504-Data.db:2195768325
=A0INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line = 132) Sending a stream initiate message to IP1...
=A0INFO [STREAM-= STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137) Waiting for tran= sfer to IP1=A0to complete
=A0WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592 MessagingSe= rvice.java (line 501) Dropped 9 messages in the last 1000ms
<= b>=A0INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line= 141) Done with transfer to IP1
=A0INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670 SSTableDeletin= gReference.java (line 104) Deleted /data/cassandra/datadir/cassandradb/syst= em/LocationInfo-17-Data.db
=A0WARN [DroppedMessagesLogger] 201= 0-09-14 10:28:01,602 MessagingService.java (line 501) Dropped 1 messages in= the last 1000ms
=A0INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133 SSTable= DeletingReference.java (line 104) Deleted /data/mysql/cassandrastorage/syst= em/LocationInfo-19-Data.db
=A0INFO [SSTABLE-CLEANUP-TIMER] 2010-0= 9-14 10:28:06,134 SSTableDeletingReference.java (line 104) Deleted /data/ca= ssandra/datadir/cassandradb/system/LocationInfo-18-Data.db
=A0INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134 SSTableDeletin= gReference.java (line 104) Deleted /data/cassandra/datadir/cassandradb/syst= em/LocationInfo-20-Data.db



Logs on new node being bootstrapped (IP1):
=
INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line = 104) New token will be 149298847080838649048722691811739653739 to assume lo= ad from IP2
=A0INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388) = Joining: sleeping 30000 ms for pending range setup
=A0INFO [main]= 2010-09-14 09:54:07,792 StorageService.java (line 388) Bootstrapping
=A0INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 1= 20) Sampling index for /data/mysql/cassandrastorage/userdata/user_list_item= s-2-Data.db
=A0INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCom= pletionHandler.java (line 64) Streaming added /data/mysql/cassandrastorage/= userdata/user_list_items-2-Data.db
=A0INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 1= 20) Sampling index for /data/cassandra/datadir/cassandradb/userdata/user-2-= Data.db
=A0INFO [Thread-23] 2010-09-14 10:26:52,311 StreamComplet= ionHandler.java (line 64) Streaming added /data/cassandra/datadir/cassandra= db/userdata/user-2-Data.db
=A0INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180) Ine= tAddress IP2 is now dead.
=A0INFO [Thread-26] 2010-09-14 10:27:18= ,858 SSTableReader.java (line 120) Sampling index for /data/cassandra/datad= ir/cassandradb/userdata/user_lists-2-Data.db
=A0INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.ja= va (line 64) Streaming added /data/cassandra/datadir/cassandradb/userdata/u= ser_lists-2-Data.db
=A0INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10= :28:00,605 HintedHandOffManager.java (line 165) Started hinted handoff for = endPoint IP2
=A0INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578) Inet= Address IP2=A0is now UP
=A0INFO [HINTED-HANDOFF-POOL:1] 2010-09-1= 4 10:28:00,606 HintedHandOffManager.java (line 222) Finished hinted handoff= of 0 rows to endpoint IP2
=A0INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (= line 357) LocationInfo has reached its threshold; switching in a fresh Memt= able at CommitLogContext(file=3D'/data/cassandra/datadir/commitlog/Comm= itLog-1284483127473.log', position=3D1353)
=A0INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (= line 609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6 = operations)
=A0INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382= Memtable.java (line 148) Writing Memtable-LocationInfo@2057079871(210 byte= s, 6 operations)
=A0INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (l= ine 162) Completed flushing /data/cassandra/datadir/cassandradb/system/Loca= tionInfo-10-Data.db
=A0

On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vineetdaniel@gmail.c= om> wrote:
Hi Gurpreet

What is the output of=A0 = nodetool=A0-h=A0<= hostname/IP>=A0streams=A0 -->( to see what is going on between the no= des) . If you dont see anything happening try switching off= firewall or iptables.


Regards
Vineet Daniel<= /font>
Cel= l=A0=A0=A0=A0=A0=A0=A0=A0=A0 : +918106217121
Websites :

Blog =A0 |=A0=A0 <= /span>Linkedin<= /a>= =A0 |=A0 Twitt= er





On Tue, Sep 14, 2010 at 11:11 PM, Gurpre= et Singh <gurpreet.singh@gmail.com> wrote:
I tried this again, it happenned yet again.=A0
This time while the tran= sfer messages seemed tobe in order, i also noticed that the source logs tal= k about having 9 dropped messages in the last 1000 ms. The only activity on= the whole cluster is this bootstrapping, there is no read/write traffic go= ing on.

/G
=

On Tue, Sep 14, 2010 at 10:05 AM, = Gurpreet Singh <gurpreet.singh@gmail.com> wrote:
I am using cassandra 0.= 6.5.


On Tue, Sep 14,= 2010 at 9:16 AM, Gurpreet Singh <gurpreet.singh@gmail.com><= /span> wrote:
Hi,
I have a cassandra cluster of 4 machines, and I am trying to = bootstrap 2 more machines, one at a time.
For both these machines, the = bootstrapping stays stuck after the streaming is done.

When the nodes come up for bootstrapping, I see all the relevant messa= ges about getting a new token, assuming load from a particular host. I see = a couple of nodes anticompacting data to send, and at a later point the nod= e that is bootstrapping prints the right streaming mesgs. However, once the= streaming is over, the node just doesnt do anything. Previously while boot= strapping, I have seen that after the streaming is done, the node restarts = and becomes part of the ring by itself. I dont see this behaviour with both= the nodes I tried today.
I even restarted all the nodes in the cluster, and tried bootstrapping= one of the nodes again, but it again was stuck after streaming. It seems t= o have copied the relevant load as well.
Any ideas as to what cou= ld be going on here?

/G






--005045016763b80ead049067906b--