Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jonathan.colby@gmail.com
 designates 209.85.161.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:mime-version:content-type:subject:date:in-reply-to:to
         :references:message-id:x-mailer;
        b=H6pKBxMk/zoifM7j82JhAV2JVr8tBs666A+rIEinA0BERza96hR1OQ3RBzM1WUu/qQ
         HCdlN1LelHNciiSrGA2MhzlKDS2L4RURzHqKLw95/5Xx2O7q9ydPf4LSpTVrixAq29wa
         fJxjGA2nKnfQn2yMFshUKdky3uh2w555f/iDM=
From: Jonathan Colby <jonathan.colby@gmail.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-2--657604528
Subject: Re: nodetool cleanup - results in more disk use?
Date: Mon, 4 Apr 2011 14:20:26 +0200
In-Reply-To: <670EEF0F-EC0D-48DA-9B88-A6F11DE4DDB8@thelastpickle.com>
To: user@cassandra.apache.org
References: <3C4598EB-152B-4E09-9E98-F42FF02628CC@gmail.com>
 <F03C3C28-2467-4309-B837-8239465AB3F8@gmail.com>
 <670EEF0F-EC0D-48DA-9B88-A6F11DE4DDB8@thelastpickle.com>
Message-Id: <8B5B8A8F-E86E-4CBD-9F8F-ABB292C7189B@gmail.com>


--Apple-Mail-2--657604528
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

hi Aaron -

The Datastax documentation brought to light the fact that over time, =
major compactions  will be performed on bigger and bigger SSTables.   =
They actually recommend against performing too many major compactions.  =
Which is why I am wary to trigger too many major compactions ...

http://www.datastax.com/docs/0.7/operations/scheduled_tasks
Performing Major Compaction=B6

A major compaction process merges all SSTables for all column families =
in a keyspace =96 not just similar sized ones, as in minor compaction. =
Note that this may create extremely large SStables that result in long =
intervals before the next minor compaction (and a resulting increase in =
CPU usage for each minor compaction).

Though a major compaction ultimately frees disk space used by =
accumulated SSTables, during runtime it can temporarily double disk =
space usage. It is best to run major compactions, if at all, at times of =
low demand on the cluster.


On Apr 4, 2011, at 1:57 PM, aaron morton wrote:

> cleanup reads each SSTable on disk and writes a new file that contains =
the same data with the exception of rows that are no longer in a token =
range the node is a replica for. It's not compacting the files into =
fewer files or purging tombstones. But it is re-writing all the data for =
the CF.=20
>=20
> Part of the process will trigger GC if needed to free up disk space =
from SSTables no longer needed.
>=20
> AFAIK having fewer bigger files will not cause longer minor =
compactions. Compaction thresholds are applied per bucket of files that =
share a similar size, there is normally more smaller files and fewer =
larger files.=20
>=20
> Aaron
>=20
> On 2 Apr 2011, at 01:45, Jonathan Colby wrote:
>=20
>> I discovered that a Garbage collection cleans up the unused old =
SSTables.   But I still wonder whether cleanup really does a full =
compaction.  This would be undesirable if so.
>>=20
>>=20
>> On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote:
>>=20
>>> I ran node cleanup on a node in my cluster and discovered the disk =
usage went from 3.3 GB to 5.4 GB.  Why is this?
>>>=20
>>> I thought cleanup just removed hinted handoff information.   I read =
that *during* cleanup extra disk space will be used similar to a =
compaction.  But I was expecting the disk usage to go back down when it =
finished.
>>>=20
>>> I hope cleanup doesn't trigger a major compaction.  I'd rather not =
run major compactions because it means future minor compactions will =
take longer and use more CPU and disk.
>>>=20
>>>=20
>>=20
>=20


--Apple-Mail-2--657604528
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">hi =
Aaron -<div><br></div><div>The Datastax documentation brought to light =
the fact that over time, major compactions &nbsp;will be performed on =
bigger and bigger SSTables. &nbsp; They actually recommend against =
performing too many major compactions. &nbsp;Which is why I am wary to =
trigger too many major compactions ...</div><div><br></div><div><a =
href=3D"http://www.datastax.com/docs/0.7/operations/scheduled_tasks">http:=
//www.datastax.com/docs/0.7/operations/scheduled_tasks</a></div><div><div =
class=3D"section" id=3D"performing-major-compaction">
<h3>Performing Major Compaction<a class=3D"headerlink" =
href=3D"http://www.datastax.com/docs/0.7/operations/scheduled_tasks#perfor=
ming-major-compaction" title=3D"Permalink to this =
headline">=B6</a></h3><p>A major compaction process merges all SSTables =
for all column=20
families in a keyspace =96 not just similar sized ones, as in minor=20
compaction.  Note that this may create extremely large SStables that=20
result in long intervals before the next minor compaction (and a=20
resulting increase in CPU usage for each minor compaction).</p><p>Though =
a major compaction ultimately frees disk space used by=20
accumulated SSTables, during runtime it can temporarily double disk=20
space usage. It is best to run major compactions, if at all, at times of
 low demand on the =
cluster.</p><div><br></div><div><br></div></div></div><div><br></div><div>=
<br><div><div><br></div><div><br></div><div>On Apr 4, 2011, at 1:57 PM, =
aaron morton wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>cleanup=
 reads each SSTable on disk and writes a new file that contains the same =
data with the exception of rows that are no longer in a token range the =
node is a replica for. It's not compacting the files into fewer files or =
purging tombstones. But it is re-writing all the data for the CF. =
<br><br>Part of the process will trigger GC if needed to free up disk =
space from SSTables no longer needed.<br><br>AFAIK having fewer bigger =
files will not cause longer minor compactions. Compaction thresholds are =
applied per bucket of files that share a similar size, there is normally =
more smaller files and fewer larger files. <br><br>Aaron<br><br>On 2 Apr =
2011, at 01:45, Jonathan Colby wrote:<br><br><blockquote type=3D"cite">I =
discovered that a Garbage collection cleans up the unused old SSTables. =
&nbsp;&nbsp;But I still wonder whether cleanup really does a full =
compaction. &nbsp;This would be undesirable if =
so.<br></blockquote><blockquote type=3D"cite"><br></blockquote><blockquote=
 type=3D"cite"><br></blockquote><blockquote type=3D"cite">On Apr 1, =
2011, at 4:08 PM, Jonathan Colby wrote:<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite">I ran node cleanup on a node in my cluster and discovered =
the disk usage went from 3.3 GB to 5.4 GB. &nbsp;Why is =
this?<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">I thought cleanup just removed =
hinted handoff information. &nbsp;&nbsp;I read that *during* cleanup =
extra disk space will be used similar to a compaction. &nbsp;But I was =
expecting the disk usage to go back down when it =
finished.<br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">I hope cleanup doesn't trigger a =
major compaction. &nbsp;I'd rather not run major compactions because it =
means future minor compactions will take longer and use more CPU and =
disk.<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><br></blockquote><br></div></blockquote></div><br></div></bo=
dy></html>=

--Apple-Mail-2--657604528--