Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAO5xsd2LdeJiC++XK_EpKMcJrykGS+_d2LWZS+G6k5V12SHUxw@mail.gmail.com>
References: 
 <CAG00uybf1iroC8yUD2By0G2CUaxUj3effJ9u3dFxmGXvQZNO2w@mail.gmail.com>
	<CAO5xsd2fPoZiMmopUVXXD45HuggLssDOFrD16_syc1LjeK4_MA@mail.gmail.com>
	<CAG00uyb70_H1APKiyZvDqf1us=TJxZfak41EkKxyT87cQ+2RRg@mail.gmail.com>
	<CAO5xsd0i5JP-tOSimQHBWyQxKK+8U632iJUvPWV5359v8mRyxg@mail.gmail.com>
	<CAG00uyYuXJEzW-9T+etad=acZCToqtX58dhXiev5ezCCu3T-ig@mail.gmail.com>
	<CAO5xsd2LdeJiC++XK_EpKMcJrykGS+_d2LWZS+G6k5V12SHUxw@mail.gmail.com>
Date: Mon, 22 Aug 2011 16:52:55 -0400
Message-ID: 
 <CAG00uyb32OW1DWWD4Qv69gphMh_3_A73nZD6dj882D7nm=gOEQ@mail.gmail.com>
Subject: Re: nodetool repair caused high disk space usage
From: Huy Le <huyle@springpartners.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd3b51a6ed5bf04ab1e435f

--000e0cd3b51a6ed5bf04ab1e435f
Content-Type: text/plain; charset=ISO-8859-1

After having done so many tries, I am not sure which log entries correspond
to what.   However, there were many of this type:

 WARN [CompactionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java
(line 730) Index file contained a different key or row size; using key from
data file

And there were out of disk space errors (because hundreds of gigs were used
up).

Anyhow, disk space usage is under control now, at least for 3 nodes so far,
the repair the last node is still running .  Here is what I did that led to
the disk space usage under control:

* Shutdown cass
* Restored data from backup for 0.6.11.
* Started up cass 0.6.11
* Ran repair on all nodes, one at a time.  After repair, the node that
showed up in ring as having 40GB of data before, now have 120GB of data.
All other nodes showed the same amount of data.  Not much disk space usage
increase compare to what seen before.  Just some Compacted files.
* Ran compact on all nodes
* Drained all nodes
* Shutdown cass
* Started up cass with version 0.8.4
* Applied schema to 0.8.4
* Ran scrub on all nodes
* Ran repair on each of the 4 nodes, one at at time (repair on the last node
is still running).  Data size show up in ring the same size as it was in
cass 0.6.11.  No disk space usage increase.

So it seems like data was in inconsistent state before it was upgraded to
cass 0.8.4, and some how that triggered cass 0.8.4 repair to cause disk
usage out of control.  Or may be data is already consistent across the
nodes, and now running repair does not do any kind of data transfer.

Once the repair for this last node is completed, I will start populating
some data by using our application.  While using the app, I will randomly
restart few nodes, one at a time to cause data missing on some nodes.  Then
I will run repair again to see if the disk usage still under control.


Huy

On Fri, Aug 19, 2011 at 7:22 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Is there any chance that theentire file from source node got streamed to
> > destination node even though only small amount of data in hte file from
> > source node is supposed to be streamed destination node?
>
> Yes, but the thing that's annoying me is that even if so - you should
> not be seeing a 40 gb -> hundreds of gig increase even if all
> neighbors sent all their data.
>
> Can you check system.log for references to these sstables to see when
> and under what circumstances they got written?
>
> --
> / Peter Schuller (@scode on twitter)
>


-- 
Huy Le
Spring Partners, Inc.
http://springpadit.com

--000e0cd3b51a6ed5bf04ab1e435f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

After having done so many tries, I am not sure which log entries correspond=
 to what.=A0=A0 However, there were many of this type:<br><br>=A0WARN [Comp=
actionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java (line 730=
) Index file contained a different key or row size; using key from data fil=
e<br>
<br>And there were out of disk space errors (because hundreds of gigs were =
used up).<br><br>Anyhow, disk space usage is under control now, at least fo=
r 3 nodes so far, the repair the last node is still running .=A0 Here is wh=
at I did that led to the disk space usage under control:<br>
<br>* Shutdown cass<br>* Restored data from backup for 0.6.11.<br>* Started=
 up cass 0.6.11<br>* Ran repair on all nodes, one at a time.=A0 After repai=
r, the node that showed up in ring as having 40GB of data before, now have =
120GB of data.=A0 All other nodes showed the same amount of data.=A0 Not mu=
ch disk space usage increase compare to what seen before.=A0 Just some Comp=
acted files.<br>
* Ran compact on all nodes<br>* Drained all nodes<br>* Shutdown cass<br>* S=
tarted up cass with version 0.8.4<br>* Applied schema to 0.8.4<br>* Ran scr=
ub on all nodes<br>* Ran repair on each of the 4 nodes, one at at time (rep=
air on the last node is still running).=A0 Data size show up in ring the sa=
me size as it was in cass 0.6.11.=A0 No disk space usage increase.<br>
<br>So it seems like data was in inconsistent state before it was upgraded =
to cass 0.8.4, and some how that triggered cass 0.8.4 repair to cause disk =
usage out of control.=A0 Or may be data is already consistent across the no=
des, and now running repair does not do any kind of data transfer.=A0 <br>
<br>Once the repair for this last node is completed, I will start populatin=
g some data by using our application.=A0 While using the app, I will random=
ly restart few nodes, one at a time to cause data missing on some nodes.=A0=
 Then I will run repair again to see if the disk usage still under control.=
<br>
<br><br>Huy<br><br><div class=3D"gmail_quote">On Fri, Aug 19, 2011 at 7:22 =
PM, Peter Schuller <span dir=3D"ltr">&lt;<a href=3D"mailto:peter.schuller@i=
nfidyne.com">peter.schuller@infidyne.com</a>&gt;</span> wrote:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex;">
<div class=3D"im">&gt; Is there any chance that theentire file from source =
node got streamed to<br>
&gt; destination node even though only small amount of data in hte file fro=
m<br>
&gt; source node is supposed to be streamed destination node?<br>
<br>
</div>Yes, but the thing that&#39;s annoying me is that even if so - you sh=
ould<br>
not be seeing a 40 gb -&gt; hundreds of gig increase even if all<br>
neighbors sent all their data.<br>
<br>
Can you check system.log for references to these sstables to see when<br>
and under what circumstances they got written?<br>
<font color=3D"#888888"><br>
--<br>
</font><div><div></div><div class=3D"h5">/ Peter Schuller (@scode on twitte=
r)<br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br>Huy Le <br>=
Spring Partners, Inc.<br><a href=3D"http://springpadit.com">http://springpa=
dit.com</a> <br>

--000e0cd3b51a6ed5bf04ab1e435f--