Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D21B82FB for ; Mon, 22 Aug 2011 20:53:27 +0000 (UTC) Received: (qmail 38737 invoked by uid 500); 22 Aug 2011 20:53:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38668 invoked by uid 500); 22 Aug 2011 20:53:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38659 invoked by uid 99); 22 Aug 2011 20:53:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 20:53:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 20:53:17 +0000 Received: by ywe9 with SMTP id 9so778483ywe.31 for ; Mon, 22 Aug 2011 13:52:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.134.8 with SMTP id h8mr1700327wfd.421.1314046375547; Mon, 22 Aug 2011 13:52:55 -0700 (PDT) Received: by 10.143.9.4 with HTTP; Mon, 22 Aug 2011 13:52:55 -0700 (PDT) In-Reply-To: References: Date: Mon, 22 Aug 2011 16:52:55 -0400 Message-ID: Subject: Re: nodetool repair caused high disk space usage From: Huy Le To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd3b51a6ed5bf04ab1e435f X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd3b51a6ed5bf04ab1e435f Content-Type: text/plain; charset=ISO-8859-1 After having done so many tries, I am not sure which log entries correspond to what. However, there were many of this type: WARN [CompactionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java (line 730) Index file contained a different key or row size; using key from data file And there were out of disk space errors (because hundreds of gigs were used up). Anyhow, disk space usage is under control now, at least for 3 nodes so far, the repair the last node is still running . Here is what I did that led to the disk space usage under control: * Shutdown cass * Restored data from backup for 0.6.11. * Started up cass 0.6.11 * Ran repair on all nodes, one at a time. After repair, the node that showed up in ring as having 40GB of data before, now have 120GB of data. All other nodes showed the same amount of data. Not much disk space usage increase compare to what seen before. Just some Compacted files. * Ran compact on all nodes * Drained all nodes * Shutdown cass * Started up cass with version 0.8.4 * Applied schema to 0.8.4 * Ran scrub on all nodes * Ran repair on each of the 4 nodes, one at at time (repair on the last node is still running). Data size show up in ring the same size as it was in cass 0.6.11. No disk space usage increase. So it seems like data was in inconsistent state before it was upgraded to cass 0.8.4, and some how that triggered cass 0.8.4 repair to cause disk usage out of control. Or may be data is already consistent across the nodes, and now running repair does not do any kind of data transfer. Once the repair for this last node is completed, I will start populating some data by using our application. While using the app, I will randomly restart few nodes, one at a time to cause data missing on some nodes. Then I will run repair again to see if the disk usage still under control. Huy On Fri, Aug 19, 2011 at 7:22 PM, Peter Schuller wrote: > > Is there any chance that theentire file from source node got streamed to > > destination node even though only small amount of data in hte file from > > source node is supposed to be streamed destination node? > > Yes, but the thing that's annoying me is that even if so - you should > not be seeing a 40 gb -> hundreds of gig increase even if all > neighbors sent all their data. > > Can you check system.log for references to these sstables to see when > and under what circumstances they got written? > > -- > / Peter Schuller (@scode on twitter) > -- Huy Le Spring Partners, Inc. http://springpadit.com --000e0cd3b51a6ed5bf04ab1e435f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable After having done so many tries, I am not sure which log entries correspond= to what.=A0=A0 However, there were many of this type:

=A0WARN [Comp= actionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java (line 730= ) Index file contained a different key or row size; using key from data fil= e

And there were out of disk space errors (because hundreds of gigs were = used up).

Anyhow, disk space usage is under control now, at least fo= r 3 nodes so far, the repair the last node is still running .=A0 Here is wh= at I did that led to the disk space usage under control:

* Shutdown cass
* Restored data from backup for 0.6.11.
* Started= up cass 0.6.11
* Ran repair on all nodes, one at a time.=A0 After repai= r, the node that showed up in ring as having 40GB of data before, now have = 120GB of data.=A0 All other nodes showed the same amount of data.=A0 Not mu= ch disk space usage increase compare to what seen before.=A0 Just some Comp= acted files.
* Ran compact on all nodes
* Drained all nodes
* Shutdown cass
* S= tarted up cass with version 0.8.4
* Applied schema to 0.8.4
* Ran scr= ub on all nodes
* Ran repair on each of the 4 nodes, one at at time (rep= air on the last node is still running).=A0 Data size show up in ring the sa= me size as it was in cass 0.6.11.=A0 No disk space usage increase.

So it seems like data was in inconsistent state before it was upgraded = to cass 0.8.4, and some how that triggered cass 0.8.4 repair to cause disk = usage out of control.=A0 Or may be data is already consistent across the no= des, and now running repair does not do any kind of data transfer.=A0

Once the repair for this last node is completed, I will start populatin= g some data by using our application.=A0 While using the app, I will random= ly restart few nodes, one at a time to cause data missing on some nodes.=A0= Then I will run repair again to see if the disk usage still under control.=


Huy

On Fri, Aug 19, 2011 at 7:22 = PM, Peter Schuller <peter.schuller@infidyne.com> wrote:
> Is there any chance that theentire file from source = node got streamed to
> destination node even though only small amount of data in hte file fro= m
> source node is supposed to be streamed destination node?

Yes, but the thing that's annoying me is that even if so - you sh= ould
not be seeing a 40 gb -> hundreds of gig increase even if all
neighbors sent all their data.

Can you check system.log for references to these sstables to see when
and under what circumstances they got written?

--
/ Peter Schuller (@scode on twitte= r)



--
Huy Le
= Spring Partners, Inc.
http://springpa= dit.com
--000e0cd3b51a6ed5bf04ab1e435f--