Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3AC9F10C1E for ; Wed, 4 Jun 2014 21:35:39 +0000 (UTC) Received: (qmail 69444 invoked by uid 500); 4 Jun 2014 21:35:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69406 invoked by uid 500); 4 Jun 2014 21:35:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69396 invoked by uid 99); 4 Jun 2014 21:35:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 21:35:36 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 98.139.253.105 is neither permitted nor denied by domain of jatyler@yahoo-inc.com) Received: from [98.139.253.105] (HELO mrout2-b.corp.bf1.yahoo.com) (98.139.253.105) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2014 21:35:33 +0000 Received: from GQ1-EX10-CAHT04.y.corp.yahoo.com (gq1-ex10-caht04.corp.gq1.yahoo.com [10.73.118.83]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id s54LYcgA007967 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Wed, 4 Jun 2014 14:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1401917679; bh=xmgIa51x+Jj9Jqb6loxsjBunbj8vcRUhz3+lEa7ADrM=; h=From:To:CC:Subject:Date; b=hRYWomTeVTLOhF+cisRvUPXRc0FoFVyj+WjkuUsFNSRBR9kkGYc6DBGtOrKk7uwPp FCgyjGV+4o09JFb0HTXHQ4slmIcOjWecdJhYvBsO2D5WkV/xPoHlKIheeiGd2SV/8u arcUl+r6nLbNAnR0VhzA7Af3Lug3Z7n1HMdTr1jg= Received: from GQ1-EX10-MB03.y.corp.yahoo.com ([fe80::745a:90c1:f3eb:5ec8]) by GQ1-EX10-CAHT04.y.corp.yahoo.com ([fe80::154d:d141:c8f7:3e0c%12]) with mapi id 14.03.0181.006; Wed, 4 Jun 2014 14:34:38 -0700 From: Jason Tyler To: "user@cassandra.apache.org" CC: Francois Richard Subject: nodetool move seems slow Thread-Topic: nodetool move seems slow Thread-Index: AQHPgDzJAfSAmz82EkCI/jHqjToEiw== Date: Wed, 4 Jun 2014 21:34:37 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.72.112.121] Content-Type: multipart/alternative; boundary="_000_CFB4D6B21CCA2jatyleryahooinccom_" MIME-Version: 1.0 X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 917679000 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CFB4D6B21CCA2jatyleryahooinccom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hello, We have a 5-node cluster runing cassandra 1.2.16, with a significant amount= of data: Address Rack Status State Load Owns = Token = 6783174585269344219 10.198.xx.xx1 rack1 Up Normal 2.59 TB 60.00% = -9223372036854775808 10.198.xx.xx2 rack1 Up Normal 1.49 TB 40.00% = -5534023222112865485 10.198.xx.xx3 rack1 Up Normal 2.18 TB 53.23% = -1844674407370955162 10.198.xx.xx4 rack1 Up Normal 2.86 TB 80.00% = 5534023222112865484 10.198.xx.xx5 rack1 Up Moving 2.32 TB 66.77% = 6783174585269344219 The first three nodes (.xx1 - .xx3 above) were at the desired tokens, so I = issued a move on .xx4: nodetool move 1844674407370955161 That was about 40hrs ago! When I do nodetool netstats, I do see apparent progress: jatyler@xx4:~$ nodetool netstats Mode: MOVING Not sending any streams. Streaming from: /10.198.xx.xx2 SyncCore: /var/cassandra/data/SyncCore/file-ic-31475-Data.db sections=3D= 1 progress=3D0/77699597 - 0% =85 SyncCore: /var/cassandra/data/SyncCore/anotherFile-ic-32252-Data.db sect= ions=3D1 progress=3D0/1254063427 - 0% Read Repair Statistics: Attempted: 8047367 Mismatch (Blocking): 97327 Mismatch (Background): 74369 Pool Name Active Pending Completed Commands n/a 0 472255111 Responses n/a 1 749751322 I wrote 'apparent progress' because it reports =93MOVING=94 and the Pending= Commands/Responses are changing over time. However, I haven=92t seen the = individual .db files progress go above 0%. Meanwhile, the system appears to have plenty of unused bandwidth, from 'ios= tat -x -m 1': Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz = avgqu-sz await svctm %util sda 0.00 56.00 1338.00 171.00 57.59 0.89 79.36 = 0.57 0.38 0.17 25.30 avg-cpu: %user %nice %system %iowait %steal %idle 22.77 1.82 2.35 0.20 0.00 72.86 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz = avgqu-sz await svctm %util sda 0.00 0.00 785.00 0.00 33.80 0.00 88.17 = 0.27 0.35 0.18 14.10 avg-cpu: %user %nice %system %iowait %steal %idle 20.16 2.05 2.22 0.20 0.00 75.37 Is 40 hours too long for this move? Should I be seeing individual .db file= s report more progress? Should I start with the first box (even though the= token appears correct)? Any thoughts would be greatly appreciated. THX Cheers, ~Jason ******* --_000_CFB4D6B21CCA2jatyleryahooinccom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <834A5562A93B7A458E756D7F9018BA6B@yforest.corp.yahoo.com> Content-Transfer-Encoding: quoted-printable
Hello,

We have a= 5-node cluster runing cassandra 1.2.16, with a significant amount of data:=

Address = ;       Rack        Status State   = Load            Owns      &nbs= p;         Token          &nbs= p;                     &n= bsp;      

   = ;                     &nb= sp;                     &= nbsp;                    =           6783174585269344219     =                    

10.198.xx.xx= 1  rack1       Up     Normal  2.59 TB &n= bsp;       60.00%            &= nbsp; -9223372036854775808              =          

10.198.xx.xx= 2  rack1       Up     Normal  1.49 TB &n= bsp;       40.00%            &= nbsp; -5534023222112865485              =          

10.198.xx.xx= 3  rack1       Up     Normal  2.18 TB &n= bsp;       53.23%            &= nbsp; -1844674407370955162              =          

10.198.xx.xx= 4  rack1       Up     Normal  2.86 TB &n= bsp;       80.00%            &= nbsp; 5534023222112865484              &= nbsp;          

10.198.xx.xx= 5  rack1       Up     Moving  2.32 = TB         66.77%          &nb= sp;   6783174585269344219   



The first three nodes (.xx1 - .xx3 above) were at= the desired tokens, so I issued a move on .xx4:

nodetool move 1844674407370955161 


That was about 40hrs ago!  


When I do nodetool netstats, I do see apparent pr= ogress:


jatyler@xx4:= ~$ nodetool netstats

Mode: MOVING=

Not sending = any streams.

Streaming fr= om: /10.198.xx.xx2

  = SyncCore: /var/cassandra/data/SyncCore/file-ic-31475-Data.db sections=3D1 = progress=3D0/77699597 - 0%

=85

  = SyncCore: /var/cassandra/data/SyncCore/anotherFile-ic-32252-Data.db sectio= ns=3D1 progress=3D0/1254063427 - 0%

Read Repair = Statistics:

Attempted: 8= 047367

Mismatch (Bl= ocking): 97327

Mismatch (Ba= ckground): 74369

Pool Name&nb= sp;                   Active &= nbsp; Pending      Completed

Commands&nbs= p;                     &n= bsp; n/a         0      472255111

Responses &n= bsp;                     = n/a         1      749751322




I wrote '= apparent progress' because it reports =93MOVING=94 and the Pending Commands= /Responses are changing over time.  However, I haven=92t seen the indi= vidual .db files progress go above 0%.

Meanwhile= , the system appears to have plenty of unused bandwidth, from 'iostat -x -m 1':

Device: &nbs= p;       rrqm/s   wrqm/s     r/s    = ; w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await&= nbsp; svctm  %util

sda   &= nbsp;           0.00    56.00 1338.00&nb= sp; 171.00    57.59     0.89    79.36   =   0.57    0.38   0.17  25.30


avg-cpu:&nbs= p; %user   %nice %system %iowait  %steal   %idle

   = ;       22.77    1.82    2.35  &nbs= p; 0.20    0.00   72.86


Device: &nbs= p;       rrqm/s   wrqm/s     r/s    = ; w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await&= nbsp; svctm  %util

sda   &= nbsp;           0.00     0.00  785.= 00    0.00    33.80     0.00    88.= 17     0.27    0.35   0.18  14.10


avg-cpu:&nbs= p; %user   %nice %system %iowait  %steal   %idle

   = ;       20.16    2.05    2.22  &nbs= p; 0.20    0.00   75.37




Is 40 hours too long for this m= ove?  Should I be seeing individual .db files report more progress? &n= bsp;Should I start with the first box (even though the token appears correc= t)?


Any thoughts would be greatly a= ppreciated.


THX


Cheers,

~Jason
******* --_000_CFB4D6B21CCA2jatyleryahooinccom_--