incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Watson <j...@disqus.com>
Subject Re: cassandra-shuffle time to completion and required disk space
Date Mon, 29 Apr 2013 17:08:50 GMT
That's what we tried first before the shuffle. And ran into the space issue.

That's detailed in another thread title: "Adding nodes in 1.2 with vnodes
requires huge disks"


On Mon, Apr 29, 2013 at 4:08 AM, Sam Overton <sam@acunu.com> wrote:

> An alternative to running shuffle is to do a rolling
> bootstrap/decommission. You would set num_tokens on the existing hosts (and
> restart them) so that they split their ranges, then bootstrap in N new
> hosts, then decommission the old ones.
>
>
>
> On 28 April 2013 22:21, John Watson <john@disqus.com> wrote:
>
>> The amount of time/space cassandra-shuffle requires when upgrading to
>> using vnodes should really be apparent in documentation (when some is made).
>>
>> Only semi-noticeable remark about the exorbitant amount of time is a
>> bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance
>>
>> "Shuffling will entail moving a lot of data around the cluster and so has
>> the potential to consume a lot of disk and network I/O, and to take a
>> considerable amount of time. For this to be an online operation, the
>> shuffle will need to operate on a lower priority basis to other streaming
>> operations, and should be expected to take days or weeks to complete."
>>
>> We tried running shuffle on a QA version of our cluster and 2 things were
>> brought to light:
>>  - Even with no reads/writes it was going to take 20 days
>>  - Each machine needed enough free diskspace to potentially hold the
>> entire cluster's sstables on disk
>>
>> Regards,
>>
>> John
>>
>
>
>
> --
> Sam Overton
> Acunu | http://www.acunu.com | @acunu
>

Mime
View raw message