Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFB1CC9A5 for ; Tue, 22 May 2012 09:06:03 +0000 (UTC) Received: (qmail 12650 invoked by uid 500); 22 May 2012 09:06:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12076 invoked by uid 500); 22 May 2012 09:05:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12038 invoked by uid 99); 22 May 2012 09:05:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 09:05:53 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a82.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 09:05:45 +0000 Received: from homiemail-a82.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a82.g.dreamhost.com (Postfix) with ESMTP id 8B6C1282065 for ; Tue, 22 May 2012 02:05:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=xwJ2m5zlEF D9/uI1lXQdQ9NxiK75ME1JKZDVxvPFedYIsSdELHhJJkovNRzjGeYj6rdQHTDIiA ElC4ACbzBFxhw4RxYB2gMNZhERxxwkUeim2oLwF14mqo9Si50rV3YFLn71ZwXB7x 8LR4ytdreuj9NzZzFyTgA2A9dZeKw+VaM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=6ovdjtMPH3rEFY4C G2rgfHiL0rE=; b=FlTGzFPaR6jDSKaWyKTBjy6PSQfRpGWffKnxLy9HDuCU4u3r Rm9zMVKSWBLeEhcyuk1SjbHjoE3EcuK7aWLiumoW+OAhPFxYvThJSc3PKHz4/fDC CfyOHrwUw+vcyfTtMZxZB1UwZpKquZLJYpwvGZH8wNVIDqVBy1dv84coi2I= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a82.g.dreamhost.com (Postfix) with ESMTPSA id 03EE1282061 for ; Tue, 22 May 2012 02:05:21 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_FE127A22-0876-4ED6-ACEC-80CC708D635C" Subject: Re: nodetool repair taking forever Date: Tue, 22 May 2012 21:05:18 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1257) --Apple-Mail=_FE127A22-0876-4ED6-ACEC-80CC708D635C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > I also dont understand if all these nodes are replicas of each other = why is that the first node has almost double the data. Have you performed any token moves ? Old data is not deleted unless you = run nodetool cleanup.=20 Another possibility is things like a lot of hints. Admittedly it would = have to be a *lot* of hints. The third is that compaction has fallen behind.=20 > This week its even worse, the nodetool repair has been running for the = last 15 hours just on the first node and when I run nodetool = compactionstats I constantly see this - >=20 > pending tasks: 3 First check the logs for errors.=20 Repair will first calculate the differences, you can see this as a = validation compaction in nodetool compactionstats. Then it will stream the data, you can watch that with nodetool netstats.=20= Try to work out which part is taking the most time. 15 hours for 50Gb = sounds like a long time (btw do you have compaction on ?) Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/05/2012, at 3:14 AM, Raj N wrote: > Hi experts, >=20 > I have a 6 node cluster spread across 2 DCs.=20 >=20 > DC Rack Status State Load Owns = Token > = 113427455640312814857969558651062452225 > DC1 RAC13 Up Normal 95.98 GB 33.33% 0 > DC2 RAC5 Up Normal 50.79 GB 0.00% 1 > DC1 RAC18 Up Normal 50.83 GB 33.33% = 56713727820156407428984779325531226112 > DC2 RAC7 Up Normal 50.74 GB 0.00% = 56713727820156407428984779325531226113 > DC1 RAC19 Up Normal 61.72 GB 33.33% = 113427455640312814857969558651062452224 > DC2 RAC9 Up Normal 50.83 GB 0.00% = 113427455640312814857969558651062452225 >=20 > They are all replicas of each other. All reads and writes are done at = LOCAL_QUORUM. We are on Cassandra 0.8.4. I see that our weekend nodetool = repair runs for more than 12 hours. Especially on the first one which = has 96 GB data. Is this usual? We are using 500 GB SAS drives with ext4 = file system. This gets worse every week. This week its even worse, the = nodetool repair has been running for the last 15 hours just on the first = node and when I run nodetool compactionstats I constantly see this - >=20 > pending tasks: 3 >=20 > and nothing else. Looks like its just stuck. There's nothing = substantial in the logs as well. I also dont understand if all these = nodes are replicas of each other why is that the first node has almost = double the data. Any help will be really appreciated. >=20 > Thanks > -Raj --Apple-Mail=_FE127A22-0876-4ED6-ACEC-80CC708D635C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
I also dont understand if all these = nodes are replicas of each other why is that the first node has almost = double the data.
Have you performed any token moves ? = Old data is not deleted unless you run nodetool = cleanup. 
Another possibility is things like a lot of hints. = Admittedly it would have to be a *lot* of hints.
The third is = that compaction has fallen = behind. 

This= week its even worse, the nodetool repair has been running for the last = 15 hours just on the first node and when I run nodetool compactionstats = I constantly see this -

pending tasks: = 3
First check the logs for = errors. 

Repair will first calculate the = differences, you can see this as a validation compaction in nodetool = compactionstats.
Then it will stream the data, you can watch = that with nodetool netstats. 

Try to work = out which part is taking the most time.   15 hours for 50Gb sounds = like a long time (btw do you have compaction on = ?)

Cheers

http://www.thelastpickle.com

On 20/05/2012, at 3:14 AM, Raj N wrote:

Hi = experts,

I have a 6 node cluster spread across = 2 DCs. 

    DC   =        Rack        Status State =   Load            Owns   =  Token
              =                     =                     =             =  113427455640312814857969558651062452225
    DC1         RAC13     =   Up     Normal  95.98 GB       =  33.33%  0
    DC2       =   RAC5        Up     Normal =  50.79 GB        0.00%   = 1
    DC1         RAC18   =     Up     Normal  50.83 GB     =    33.33%  56713727820156407428984779325531226112
    DC2         RAC7     =    Up     Normal  50.74 GB       =  0.00%   = 56713727820156407428984779325531226113
    DC1 =         RAC19       Up     = Normal  61.72 GB        33.33% =  113427455640312814857969558651062452224
    DC2         RAC9     =    Up     Normal  50.83 GB       =  0.00%   = 113427455640312814857969558651062452225

The= y are all replicas of each other. All reads and writes are done at = LOCAL_QUORUM. We are on Cassandra 0.8.4. I see that our weekend nodetool = repair runs for more than 12 hours. Especially on the first one which = has 96 GB data. Is this usual? We are using 500 GB SAS drives with ext4 = file system. This gets worse every week. This week its even = worse, the nodetool repair has been running for the last 15 hours just = on the first node and when I run nodetool compactionstats I constantly = see this -

pending tasks: = 3

and nothing else. Looks like its just = stuck. There's nothing substantial in the logs as well. I also dont = understand if all these nodes are replicas of each other why is that the = first node has almost double the data. Any help will be really = appreciated.

Thanks
-Raj

= --Apple-Mail=_FE127A22-0876-4ED6-ACEC-80CC708D635C--