incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Tyler <jaty...@yahoo-inc.com>
Subject nodetool move seems slow
Date Wed, 04 Jun 2014 21:34:37 GMT
Hello,

We have a 5-node cluster runing cassandra 1.2.16, with a significant amount of data:


Address        Rack        Status State   Load            Owns                Token

                                                                              6783174585269344219

10.198.xx.xx1  rack1       Up     Normal  2.59 TB         60.00%              -9223372036854775808

10.198.xx.xx2  rack1       Up     Normal  1.49 TB         40.00%              -5534023222112865485

10.198.xx.xx3  rack1       Up     Normal  2.18 TB         53.23%              -1844674407370955162

10.198.xx.xx4  rack1       Up     Normal  2.86 TB         80.00%              5534023222112865484

10.198.xx.xx5  rack1       Up     Moving  2.32 TB         66.77%              6783174585269344219



The first three nodes (.xx1 - .xx3 above) were at the desired tokens, so I issued a move on
.xx4:

nodetool move 1844674407370955161


That was about 40hrs ago!


When I do nodetool netstats, I do see apparent progress:


jatyler@xx4:~$ nodetool netstats

Mode: MOVING

Not sending any streams.

Streaming from: /10.198.xx.xx2

   SyncCore: /var/cassandra/data/SyncCore/file-ic-31475-Data.db sections=1 progress=0/77699597
- 0%

…

   SyncCore: /var/cassandra/data/SyncCore/anotherFile-ic-32252-Data.db sections=1 progress=0/1254063427
- 0%

Read Repair Statistics:

Attempted: 8047367

Mismatch (Blocking): 97327

Mismatch (Background): 74369

Pool Name                    Active   Pending      Completed

Commands                        n/a         0      472255111

Responses                       n/a         1      749751322



I wrote 'apparent progress' because it reports “MOVING” and the Pending Commands/Responses
are changing over time.  However, I haven’t seen the individual .db files progress go above
0%.

Meanwhile, the system appears to have plenty of unused bandwidth, from 'iostat -x -m 1':


Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await
 svctm  %util

sda               0.00    56.00 1338.00  171.00    57.59     0.89    79.36     0.57    0.38
  0.17  25.30


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          22.77    1.82    2.35    0.20    0.00   72.86


Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await
 svctm  %util

sda               0.00     0.00  785.00    0.00    33.80     0.00    88.17     0.27    0.35
  0.18  14.10


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          20.16    2.05    2.22    0.20    0.00   75.37




Is 40 hours too long for this move?  Should I be seeing individual .db files report more progress?
 Should I start with the first box (even though the token appears correct)?


Any thoughts would be greatly appreciated.

THX


Cheers,

~Jason
*******

Mime
View raw message