cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arya Goudarzi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1221) loadbalance operation never completes on a 3 node cluster
Date Wed, 14 Jul 2010 00:06:53 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888116#action_12888116
] 

Arya Goudarzi commented on CASSANDRA-1221:
------------------------------------------

Hi Gary,

I was able to reproduce this using today's nightly build. This time i used a smaller data
set (500000 keys) and I got the following:

[agoudarzi@cas-test3 scripts]$ nodetool --host 10.50.26.132 ring   
Address         Status State   Load            Token                                     
 
                                       160348796167900510561059505917619274541    
10.50.26.134    Up     Normal  116.98 MB       32717880524093094169411234083126184860    
 
10.50.26.132    Up     Leaving 58.58 MB        75101027859180840627831025901565139619    
 
10.50.26.133    Up     Normal  117.09 MB       160348796167900510561059505917619274541   


[agoudarzi@cas-test3 scripts]$ nodetool --host 10.50.26.132 streams
Mode: Leaving: streaming data to other nodes
Streaming to: /10.50.26.133
   /var/lib/cassandra/data/Keyspace1/Standard1-d-17-Data.db/[(0,54080834)]
Not receiving any streams.
[agoudarzi@cas-test3 scripts]$ nodetool --host 10.50.26.133 streams
Mode: Normal
Not sending any streams.
Not receiving any streams.

>From the logs of 10.50.26.132 it seams that it tried to tell 10.50.26.133 to claim its
stream:

INFO [STREAM-STAGE:1] 2010-07-13 16:50:35,994 StreamOut.java (line 135) Sending a stream initiate
message to /10.50.26.133 ...
INFO [STREAM-STAGE:1] 2010-07-13 16:50:35,994 StreamOut.java (line 140) Waiting for transfer
to /10.50.26.133 to complete

But nothing in 133's log acknowledges the receipt of the request from 132 and as you see above
it shows that it is getting no streams and this has been going for the past hour or so.

-Arya



> loadbalance operation never completes on a 3 node cluster
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-1221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1221
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>             Fix For: 0.7
>
>
> Arya Goudarzi reports:
> Please confirm if this is an issue and should be reported or I am doing something wrong.
I could not find anything relevant on JIRA:
> Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way:
>  - Added one node;
>  - Loaded default schema with RF 1 from YAML using JMX;
>  - Loaded 2M keys using py_stress;
>  - Bootstrapped a second node;
>  - Cleaned up the first node;
>  - Bootstrapped a third node;
>  - Cleaned up the second node;
> I got the following ring:
> Address       Status     Load          Range                                      Ring
>                                       154293670372423273273390365393543806425
> 10.50.26.132  Up         518.63 MB     69164917636305877859094619660693892452     |<--|
> 10.50.26.134  Up         234.8 MB      111685517405103688771527967027648896391    | 
 |
> 10.50.26.133  Up         235.26 MB     154293670372423273273390365393543806425    |-->|
> Now I ran:
> nodetool --host 10.50.26.132 loadbalance
> It's been going for a while. I checked the streams
> nodetool --host 10.50.26.134 streams
> Mode: Normal
> Not sending any streams.
> Streaming from: /10.50.26.132
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096),
(22206096,27271682)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462),
(15180462,18656982)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829),
(353139829,433883659)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059),
(366336059,450095320)]
> nodetool --host 10.50.26.132 streams
> Mode: Leaving: streaming data to other nodes
> Streaming to: /10.50.26.134
>   /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]
> Not receiving any streams.
> These have been going for the past 2 hours.
> I see in the logs of the node with 134 IP address and I saw this:
> INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not
change my token ownership to /10.50.26.132
> So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap
again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming
for the past 2 hours and the size of ring has not changed. The log in the first node says
it has started streaming for the past hours:
> INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer
process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452]
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables
for Keyspace1...
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context
metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]]
1 sstables.
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream
initiate message to /10.50.26.134 ...
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for
transfer to /10.50.26.134 to complete
>  INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo
has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log',
position=720)
>  INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing
flush of Memtable(LocationInfo)@1637794189
>  INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing
Memtable(LocationInfo)@1637794189
>  INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed
flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db
>  INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line
374) Discarding 1000
> Nothing more after this line.
> Am I doing something wrong?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message