Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 19624 invoked from network); 4 Apr 2011 12:21:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Apr 2011 12:21:00 -0000 Received: (qmail 28082 invoked by uid 500); 4 Apr 2011 12:20:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28052 invoked by uid 500); 4 Apr 2011 12:20:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28044 invoked by uid 99); 4 Apr 2011 12:20:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Apr 2011 12:20:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jonathan.colby@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Apr 2011 12:20:50 +0000 Received: by fxm15 with SMTP id 15so4428732fxm.31 for ; Mon, 04 Apr 2011 05:20:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:mime-version:content-type:subject:date :in-reply-to:to:references:message-id:x-mailer; bh=F+2IefSeh48BgzX3a1UEYBol1+XbgonMs6U2fOQFsZw=; b=ZR0a2WyKdSkgx785yGzIiMEudy7UUpY8wmXP1mF/EO+Jj4d54DpmqehhpwlpMC2N7u jYAmeCuRelrhPyROEviREJcGWkI2cGg5+Aex6KH/JMpMHHZ/zHWYTwDcp3A1WOEutB+H PkXKqCZ3yH8Orl9UwXZaN3Qy6LmJrNAQl5RZ0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer; b=H6pKBxMk/zoifM7j82JhAV2JVr8tBs666A+rIEinA0BERza96hR1OQ3RBzM1WUu/qQ HCdlN1LelHNciiSrGA2MhzlKDS2L4RURzHqKLw95/5Xx2O7q9ydPf4LSpTVrixAq29wa fJxjGA2nKnfQn2yMFshUKdky3uh2w555f/iDM= Received: by 10.223.72.132 with SMTP id m4mr5249604faj.86.1301919629363; Mon, 04 Apr 2011 05:20:29 -0700 (PDT) Received: from siteop-12.mobile.rz ([194.50.70.51]) by mx.google.com with ESMTPS id j11sm1700411faa.20.2011.04.04.05.20.27 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 04 Apr 2011 05:20:27 -0700 (PDT) From: Jonathan Colby Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-2--657604528 Subject: Re: nodetool cleanup - results in more disk use? Date: Mon, 4 Apr 2011 14:20:26 +0200 In-Reply-To: <670EEF0F-EC0D-48DA-9B88-A6F11DE4DDB8@thelastpickle.com> To: user@cassandra.apache.org References: <3C4598EB-152B-4E09-9E98-F42FF02628CC@gmail.com> <670EEF0F-EC0D-48DA-9B88-A6F11DE4DDB8@thelastpickle.com> Message-Id: <8B5B8A8F-E86E-4CBD-9F8F-ABB292C7189B@gmail.com> X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-2--657604528 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 hi Aaron - The Datastax documentation brought to light the fact that over time, = major compactions will be performed on bigger and bigger SSTables. = They actually recommend against performing too many major compactions. = Which is why I am wary to trigger too many major compactions ... http://www.datastax.com/docs/0.7/operations/scheduled_tasks Performing Major Compaction=B6 A major compaction process merges all SSTables for all column families = in a keyspace =96 not just similar sized ones, as in minor compaction. = Note that this may create extremely large SStables that result in long = intervals before the next minor compaction (and a resulting increase in = CPU usage for each minor compaction). Though a major compaction ultimately frees disk space used by = accumulated SSTables, during runtime it can temporarily double disk = space usage. It is best to run major compactions, if at all, at times of = low demand on the cluster. On Apr 4, 2011, at 1:57 PM, aaron morton wrote: > cleanup reads each SSTable on disk and writes a new file that contains = the same data with the exception of rows that are no longer in a token = range the node is a replica for. It's not compacting the files into = fewer files or purging tombstones. But it is re-writing all the data for = the CF.=20 >=20 > Part of the process will trigger GC if needed to free up disk space = from SSTables no longer needed. >=20 > AFAIK having fewer bigger files will not cause longer minor = compactions. Compaction thresholds are applied per bucket of files that = share a similar size, there is normally more smaller files and fewer = larger files.=20 >=20 > Aaron >=20 > On 2 Apr 2011, at 01:45, Jonathan Colby wrote: >=20 >> I discovered that a Garbage collection cleans up the unused old = SSTables. But I still wonder whether cleanup really does a full = compaction. This would be undesirable if so. >>=20 >>=20 >> On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote: >>=20 >>> I ran node cleanup on a node in my cluster and discovered the disk = usage went from 3.3 GB to 5.4 GB. Why is this? >>>=20 >>> I thought cleanup just removed hinted handoff information. I read = that *during* cleanup extra disk space will be used similar to a = compaction. But I was expecting the disk usage to go back down when it = finished. >>>=20 >>> I hope cleanup doesn't trigger a major compaction. I'd rather not = run major compactions because it means future minor compactions will = take longer and use more CPU and disk. >>>=20 >>>=20 >>=20 >=20 --Apple-Mail-2--657604528 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 hi = Aaron -

The Datastax documentation brought to light = the fact that over time, major compactions  will be performed on = bigger and bigger SSTables.   They actually recommend against = performing too many major compactions.  Which is why I am wary to = trigger too many major compactions ...

Performing Major Compaction=B6

A major compaction process merges all SSTables = for all column=20 families in a keyspace =96 not just similar sized ones, as in minor=20 compaction. Note that this may create extremely large SStables that=20 result in long intervals before the next minor compaction (and a=20 resulting increase in CPU usage for each minor compaction).

Though = a major compaction ultimately frees disk space used by=20 accumulated SSTables, during runtime it can temporarily double disk=20 space usage. It is best to run major compactions, if at all, at times of low demand on the = cluster.




=


On Apr 4, 2011, at 1:57 PM, = aaron morton wrote:

cleanup= reads each SSTable on disk and writes a new file that contains the same = data with the exception of rows that are no longer in a token range the = node is a replica for. It's not compacting the files into fewer files or = purging tombstones. But it is re-writing all the data for the CF. =

Part of the process will trigger GC if needed to free up disk = space from SSTables no longer needed.

AFAIK having fewer bigger = files will not cause longer minor compactions. Compaction thresholds are = applied per bucket of files that share a similar size, there is normally = more smaller files and fewer larger files.

Aaron

On 2 Apr = 2011, at 01:45, Jonathan Colby wrote:

I = discovered that a Garbage collection cleans up the unused old SSTables. =   But I still wonder whether cleanup really does a full = compaction.  This would be undesirable if = so.


On Apr 1, = 2011, at 4:08 PM, Jonathan Colby wrote:

I ran node cleanup on a node in my cluster and discovered = the disk usage went from 3.3 GB to 5.4 GB.  Why is = this?

I thought cleanup just removed = hinted handoff information.   I read that *during* cleanup = extra disk space will be used similar to a compaction.  But I was = expecting the disk usage to go back down when it = finished.

I hope cleanup doesn't trigger a = major compaction.  I'd rather not run major compactions because it = means future minor compactions will take longer and use more CPU and = disk.





= --Apple-Mail-2--657604528--