cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurpreet Singh <gurpreet.si...@gmail.com>
Subject Re: Bootstrapping stays stuck
Date Thu, 16 Sep 2010 21:59:50 GMT
Thanks to driftx from cassandra IRC channel for helping out.
This was resolved by increasing the rpc timeout for the bootstrap process.

On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh
<gurpreet.singh@gmail.com>wrote:

> This problem still stays unresolved despite numerous restarts to the
> cluster. I cant seem to find a way out of this one, and I am not really
> looking for a workaround, kinda need this to work if i need to go to
> production.
>
> Turned on the ALL logging in log4j, and now I see the following exception
> (EOFException) on the destination. After receiving each file, it seems to be
> throwing this exception. The transfer is successful except for this
> exception. The source successful declares the transfer complete. But the
> destination does not move out of the bootstrapping mode, and just sits
> there.
>
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.CompletedFileStatus@54828e7 to IP1
> TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 51) Receiving stream
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 54) Creating file for
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.CompletedFileStatus@7b41a32f to IP1
> TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
>
> /G
>
> On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh <gurpreet.singh@gmail.com
> > wrote:
>
>> Hi Vineet,
>> I have tracked the nodetool streams to completion each time. Below are the
>> logs on the source and destination node. There are 3 sstables being
>> transferred, and the transfer seems to be successful. However, after the
>> streams finish, the source prints out messages about the dropped messages,
>> which may point to the problem. ideas? I checked port 7000 is open for
>> communication. 9160 is not up on the node being bootstrapped, but that comes
>> up after the node is bootstrapped, is that right?
>>
>> Thanks a ton,
>> /G
>>
>> *Logs on the source node (IP2):*
>> *
>> *
>> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
>> Flushing memtables for userdata...
>>  INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
>> Performing anticompaction ...
>>  INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
>>  INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129)
>> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
>> 4415946752
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
>>  49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
>>  28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db.
>>  2927724285/2195768325 bytes for 3976118 keys.  Time: 179766ms.
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127)
>> Stream context metadata
>> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369,
>>  3
>> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045,
>>  3
>> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132)
>> Sending a stream initiate message to IP1...
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137)
>> Waiting for transfer to IP1 to complete
>>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592
>> MessagingService.java (line 501) Dropped 9 messages in the last 1000ms*
>> * *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line
>> 141) Done with transfer to IP1
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db
>>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602
>> MessagingService.java (line 501) Dropped 1 messages in the last 1000ms*
>> * *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/mysql/cassandrastorage/system/LocationInfo-19-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db
>>
>>
>>
>> *Logs on new node being bootstrapped (IP1):*
>>
>> INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token
>> will be 149298847080838649048722691811739653739 to assume load from IP2
>>  INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388)
>> Joining: sleeping 30000 ms for pending range setup
>>  INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388)
>> Bootstrapping
>>  INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120)
>> Sampling index for
>> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>>  INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>>  INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120)
>> Sampling index for
>> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>>  INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>>  INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180)
>> InetAddress IP2 is now dead.
>>  INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120)
>> Sampling index for
>> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>>  INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605
>> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2
>>  INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578)
>> InetAddress IP2 is now UP
>>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606
>> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
>> endpoint IP2
>>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line
>> 357) LocationInfo has reached its threshold; switching in a fresh Memtable
>> at
>> CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log',
>> position=1353)
>>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line
>> 609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6
>> operations)
>>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line
>> 148) Writing Memtable-LocationInfo@2057079871(210 bytes, 6 operations)
>>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line
>> 162) Completed flushing
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db
>>
>>
>> On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vineetdaniel@gmail.com>wrote:
>>
>>> Hi Gurpreet
>>>
>>> What is the output of  nodetool -h <hostname/IP> streams  -->( to see
>>> what is going on between the nodes) . If you dont see anything happening
>>> try switching off firewall or iptables.
>>>
>>>
>>> Regards
>>> Vineet Daniel
>>> Cell          : +918106217121
>>> Websites :
>>> Blog <http://vinetedaniel.blogspot.com>   |   Linkedin<http://in.linkedin.com/in/vineetdaniel>
>>> |  Twitter <https://twitter.com/vineetdaniel>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh <
>>> gurpreet.singh@gmail.com> wrote:
>>>
>>>> I tried this again, it happenned yet again.
>>>> This time while the transfer messages seemed tobe in order, i also
>>>> noticed that the source logs talk about having 9 dropped messages in the
>>>> last 1000 ms. The only activity on the whole cluster is this bootstrapping,
>>>> there is no read/write traffic going on.
>>>>
>>>> /G
>>>>
>>>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh <
>>>> gurpreet.singh@gmail.com> wrote:
>>>>
>>>>> I am using cassandra 0.6.5.
>>>>>
>>>>>
>>>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <
>>>>> gurpreet.singh@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap
>>>>>> 2 more machines, one at a time.
>>>>>> For both these machines, the bootstrapping stays stuck after the
>>>>>> streaming is done.
>>>>>>
>>>>>> When the nodes come up for bootstrapping, I see all the relevant
>>>>>> messages about getting a new token, assuming load from a particular
host. I
>>>>>> see a couple of nodes anticompacting data to send, and at a later
point the
>>>>>> node that is bootstrapping prints the right streaming mesgs. However,
once
>>>>>> the streaming is over, the node just doesnt do anything. Previously
while
>>>>>> bootstrapping, I have seen that after the streaming is done, the
node
>>>>>> restarts and becomes part of the ring by itself. I dont see this
behaviour
>>>>>> with both the nodes I tried today.
>>>>>> I even restarted all the nodes in the cluster, and tried bootstrapping
>>>>>> one of the nodes again, but it again was stuck after streaming. It
seems to
>>>>>> have copied the relevant load as well.
>>>>>> Any ideas as to what could be going on here?
>>>>>>
>>>>>> /G
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message