Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of svd@mylife.com designates
 208.147.48.84 as permitted sender)
Date: Thu, 9 Dec 2010 12:13:28 -0800 (PST)
From: Scott Dworkis <svd@mylife.com>
To: Rustam Aliyev <rustam@code.az>
cc: user@cassandra.apache.org
Subject: Re: Cassandra and disk space
In-Reply-To: <4D011DDA.30404@code.az>
Message-ID: <alpine.DEB.2.00.1012091206420.13323@svdpc>
References: <4D011DDA.30404@code.az>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-588802973-1291925233=:13323"
Content-ID: <alpine.DEB.2.00.1012091210460.13323@svdpc>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323329-588802973-1291925233=:13323
Content-Type: TEXT/PLAIN; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8BIT
Content-ID: <alpine.DEB.2.00.1012091210461.13323@svdpc>

i recently finished a practice expansion of 4 nodes to 5 nodes, a series 
of "nodetool move", "nodetool cleanup" and jmx gc steps.  i found that in 
some of the steps, disk usage actually grew to 2.5x the base data size on 
one of the nodes.  i'm using 0.6.4.

-scott

On Thu, 9 Dec 2010, Rustam Aliyev wrote:

> Is there any plans to improve this in future?
> 
> For big data clusters this could be very expensive. Based on your comment, I will need 200TB of storage for 100TB of data to keep Cassandra running.
> 
> --
> Rustam.
> 
> On 09/12/2010 17:56, Tyler Hobbs wrote:
>       If you are on 0.6, repair is particularly dangerous with respect to disk space usage.� If your replica is sufficiently out of sync, you can
>       triple your disk usage pretty easily.� This has been improved in 0.7, so repairs should use about half as much disk space, on average.
>
>       In general, yes, keep your nodes under 50% disk usage at all times.� Any of: compaction, cleanup, snapshotting, repair, or bootstrapping (the
>       latter two are improved in 0.7) can double your disk usage temporarily.
>
>       You should plan to add more disk space or add nodes when you get close to this limit.� Once you go over 50%, it's more difficult to add nodes,
>       at least in 0.6.
>
>       - Tyler
>
>       On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void.dev@gmail.com> wrote:
>             I recently ran into a problem during a repair operation where my nodes completely ran out of space and my whole cluster was...
>             well, clusterfucked.
>
>             I want to make sure how to prevent this problem in the future.
>
>             Should I make sure that at all times every node is under 50% of its disk space? Are there any normal day-to-day operations that
>             would cause the any one node to double in size that I should be aware of? If on or more nodes to surpass the 50% mark, what should
>             I plan to do?
>
>             Thanks for any advice
> 
> 
> 
>
--8323329-588802973-1291925233=:13323--