Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAO5xsd2fPoZiMmopUVXXD45HuggLssDOFrD16_syc1LjeK4_MA@mail.gmail.com>
References: 
 <CAG00uybf1iroC8yUD2By0G2CUaxUj3effJ9u3dFxmGXvQZNO2w@mail.gmail.com>
	<CAO5xsd2fPoZiMmopUVXXD45HuggLssDOFrD16_syc1LjeK4_MA@mail.gmail.com>
Date: Fri, 19 Aug 2011 14:42:00 -0400
Message-ID: 
 <CAG00uyb70_H1APKiyZvDqf1us=TJxZfak41EkKxyT87cQ+2RRg@mail.gmail.com>
Subject: Re: nodetool repair caused high disk space usage
From: Huy Le <huyle@springpartners.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001517401870b9ae1d04aae01569

--001517401870b9ae1d04aae01569
Content-Type: text/plain; charset=ISO-8859-1

There were few Compacted files.  I thought that might have been the cause,
but it wasn't it.  We have a CF that is 23GB, and while repair is running,
there are multiple instances of that CF created along with other CFs.

I checked the stream directory across cluster of four nodes, but it was
empty.

I can not reproduce this issue in version 0.6.11 with a copy of data backed
up prior to 0.8.4 upgrade.

Currently repair is still running on two 0.6.11 nodes.  My plan is to run
compact across the cluster running 0.6.11.  When done, make another attempt
to upgrade it to 0.8.4.

Huy

On Fri, Aug 19, 2011 at 2:26 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > After upgrading to cass 0.8.4 from cass 0.6.11.  I ran scrub.  That
> worked
> > fine.  Then I ran nodetool repair on one of the nodes.  The disk usage on
> > data directory increased from 40GB to 480GB, and it's still growing.
>
> If you check your data directory, does it contain a lot of
> "*Compacted" files? It sounds like you're churning sstables from a
> combination of compactions/flushes (including triggered by repair) and
> the old ones aren't being deleted. I wonder if there is still some
> issue causing sstable retention
>
> Since you're on 0.8.4, I'm a bit suspicious. I'd have to re-check each
> JIRA but I think the major known repair problems should be fixed
> except for CASSANDRA-2280 which is not your problem since you're going
> form a total load of 40  gig to hundreds of gigs (so even with all
> cf:s streaming, that's unexpected).
>
> Do you have any old left-over streams active on the nodes? "nodetool
> netstats". If there are "stuck" streams, they might be causing sstable
> retention beyond what you'd expect.
>
> --
> / Peter Schuller (@scode on twitter)
>


-- 
Huy Le
Spring Partners, Inc.
http://springpadit.com

--001517401870b9ae1d04aae01569
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

There were few Compacted files.=A0 I thought that might have been the cause=
, but it wasn&#39;t it.=A0 We have a CF that is 23GB, and while repair is r=
unning, there are multiple instances of that CF created along with other CF=
s. <br>
<br>I checked the stream directory across cluster of four nodes, but it was=
 empty.=A0 <br><br>I can not reproduce this issue in version 0.6.11 with a =
copy of data backed up prior to 0.8.4 upgrade.=A0 <br><br>Currently repair =
is still running on two 0.6.11 nodes.=A0 My plan is to run compact across t=
he cluster running 0.6.11.=A0 When done, make another attempt to upgrade it=
 to 0.8.4.<br>
<br>Huy<br><br><div class=3D"gmail_quote">On Fri, Aug 19, 2011 at 2:26 PM, =
Peter Schuller <span dir=3D"ltr">&lt;<a href=3D"mailto:peter.schuller@infid=
yne.com">peter.schuller@infidyne.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex;">
<div class=3D"im">&gt; After upgrading to cass 0.8.4 from cass 0.6.11.=A0 I=
 ran scrub.=A0 That worked<br>
&gt; fine.=A0 Then I ran nodetool repair on one of the nodes.=A0 The disk u=
sage on<br>
&gt; data directory increased from 40GB to 480GB, and it&#39;s still growin=
g.<br>
<br>
</div>If you check your data directory, does it contain a lot of<br>
&quot;*Compacted&quot; files? It sounds like you&#39;re churning sstables f=
rom a<br>
combination of compactions/flushes (including triggered by repair) and<br>
the old ones aren&#39;t being deleted. I wonder if there is still some<br>
issue causing sstable retention<br>
<br>
Since you&#39;re on 0.8.4, I&#39;m a bit suspicious. I&#39;d have to re-che=
ck each<br>
JIRA but I think the major known repair problems should be fixed<br>
except for CASSANDRA-2280 which is not your problem since you&#39;re going<=
br>
form a total load of 40 =A0gig to hundreds of gigs (so even with all<br>
cf:s streaming, that&#39;s unexpected).<br>
<br>
Do you have any old left-over streams active on the nodes? &quot;nodetool<b=
r>
netstats&quot;. If there are &quot;stuck&quot; streams, they might be causi=
ng sstable<br>
retention beyond what you&#39;d expect.<br>
<font color=3D"#888888"><br>
--<br>
/ Peter Schuller (@scode on twitter)<br>
</font></blockquote></div><br><br clear=3D"all"><br>-- <br>Huy Le <br>Sprin=
g Partners, Inc.<br><a href=3D"http://springpadit.com">http://springpadit.c=
om</a> <br>

--001517401870b9ae1d04aae01569--