Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F5C49639 for ; Mon, 30 Apr 2012 04:09:57 +0000 (UTC) Received: (qmail 93166 invoked by uid 500); 30 Apr 2012 04:09:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93129 invoked by uid 500); 30 Apr 2012 04:09:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93095 invoked by uid 99); 30 Apr 2012 04:09:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 04:09:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a94.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 04:09:45 +0000 Received: from homiemail-a94.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTP id 0D0E938A071 for ; Sun, 29 Apr 2012 21:09:23 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=yhnk/vJlIe IOSOKRMTd7RU8k9+OfurNF4p+oL0tKPjUZ/OeaJTDoeI45Eif/h2OmVZAcRcActX E6+x95pxDL1gT8uRpv9/NClv4Qp/S3q1+1mwUY2bI3IQjJ1TBAwFX+aR4Ef+8mQ+ RXg7dPQXe7X3ldi8uxWVioSXRIKFKko4I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=V46QNvoYw/aONHTk fWa1QuH+9IQ=; b=4o4zasYUaDKmcWzLTkMqrTWMqH8a94mbNlDhvOea7IhrZEiq lnq8sPIJiTgc5AxOMI+z9Twe9lNGUQ/su+/TVG7HR/h1SLSrWL/y+efmIp6v/F1d V3pXoISn6rQIUjwAt9cyt3oGu7qvzBkRx3vDsfQhougPgsBAORoQDgkh/T4= Received: from [192.168.1.52] (unknown [203.109.195.29]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTPSA id 8E1E038A06F for ; Sun, 29 Apr 2012 21:09:22 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_54FF4DD8-D8C0-4215-A174-6EFB66860060" Subject: Re: nodetool repair cassandra 0.8.4 HELP!!! Date: Mon, 30 Apr 2012 16:09:34 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <9BAD8877-82F1-4BA5-8657-0E7F03A756E2@thelastpickle.com> X-Mailer: Apple Mail (2.1257) --Apple-Mail=_54FF4DD8-D8C0-4215-A174-6EFB66860060 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 When you start a node does it log that it's opening SSTables ? After starting what does nodetool cfstats say for the node ? Can you connect with cassandra-cli and do a get ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/04/2012, at 10:45 PM, Raj N wrote: > I tried it on 1 column family. I believe there is a bug in 0.8* where = repair ignores the cf. I tried this multiple times on different nodes. = Every time the disk util was going uo to 80% on a 500 GB disk. I would = eventually kill the repair. I only have 60GB worth data. I see this JIRA = - >=20 > https://issues.apache.org/jira/browse/CASSANDRA-2324=20 >=20 > But that says it was fixed in 0.8 beta. Is this still broken in 0.8.4? >=20 > I also don't understand why the data was inconsistent in the first = place. I read and write at LOCAL_QUORUM.=20 >=20 > Thanks > -Raj >=20 > On Sun, Apr 29, 2012 at 2:06 AM, Watanabe Maki = wrote: > You should run repair. If the disk space is the problem, try to = cleanup and major compact before repair. > You can limit the streaming data by running repair for each column = family separately. >=20 > maki >=20 > On 2012/04/28, at 23:47, Raj N wrote: >=20 > > I have a 6 node cassandra cluster DC1=3D3, DC2=3D3 with 60 GB data = on each node. I was bulk loading data over the weekend. But we forgot to = turn off the weekly nodetool repair job. As a result, repair was = interfering when we were bulk loading data. I canceled repair by = restarting the nodes. But unfortunately after the restart it looks like = I dont have any data on those nodes when I use list on cassandra-cli. I = ran repair on one of the effected nodes, but repair seems to be taking = forever. Disk space has almost tripled. I stopped the repair again in = fear of running out of disk space. After restart, the disk space is at = 50% where as the good nodes are at 25%. How should I proceed from here. = When I run list on cassandra-cli I do see data on the effected node. But = how can I be sure I have all the data. Should I run repair again. Should = I cleanup the disk by clearing snapshots. Or should I just drop column = families and bulk load the data again? > > > > Thanks > > -Raj >=20 --Apple-Mail=_54FF4DD8-D8C0-4215-A174-6EFB66860060 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 When = you start a node does it log that it's opening SSTables = ?

After starting what does nodetool cfstats say for = the node ?

Can you connect with cassandra-cli = and do a get = ?

Cheers

http://www.thelastpickle.com

On 29/04/2012, at 10:45 PM, Raj N wrote:

I tried it = on 1 column family. I believe there is a bug in 0.8* where repair = ignores the cf. I tried this multiple times on different nodes. Every = time the disk util was going uo to 80% on a 500 GB disk. I would = eventually kill the repair. I only have 60GB worth data. I see this JIRA = -


But that says it was fixed in 0.8 beta. Is this still broken in = 0.8.4?

I also don't understand why the data was = inconsistent in the first place. I read and write at = LOCAL_QUORUM. 

Thanks
-Raj

=
On Sun, Apr 29, 2012 at 2:06 AM, Watanabe = Maki <watanabe.maki@gmail.com> wrote:
You should run repair. = If the disk space is the problem, try to cleanup and major compact = before repair.
You can limit the streaming data by running repair for each column = family separately.

maki

On 2012/04/28, at 23:47, Raj N <raj.cassandra@gmail.com> = wrote:

> I have a 6 node cassandra cluster DC1=3D3, DC2=3D3 with 60 GB data = on each node. I was bulk loading data over the weekend. But we forgot to = turn off the weekly nodetool repair job. As a result, repair was = interfering when we were bulk loading data. I canceled repair by = restarting the nodes. But unfortunately after the restart it looks like = I dont have any data on those nodes when I use list on cassandra-cli. I = ran repair on one of the effected nodes, but repair seems to be taking = forever. Disk space has almost tripled. I stopped the repair again in = fear of running out of disk space. After restart, the disk space is at = 50% where as the good nodes are at 25%. How should I proceed from here. =  When I run list on cassandra-cli I do see data on the effected = node. But how can I be sure I have all the data. Should I run repair = again. Should I cleanup the disk by clearing snapshots. Or should I just = drop column families and bulk load the data again?
>
> Thanks
> -Raj


= --Apple-Mail=_54FF4DD8-D8C0-4215-A174-6EFB66860060--