incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: cassandra-shuffle time to completion and required disk space
Date Sun, 28 Apr 2013 21:52:28 GMT
Can you provide some info on the number of nodes, node load, cluster load etc ?

AFAIK shuffle was not an easy thing to test and does not get much real world use as only some
people will run it and they (normally) use it once.

Any info you can provide may help improve the process. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/04/2013, at 9:21 AM, John Watson <john@disqus.com> wrote:

> The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should
really be apparent in documentation (when some is made).
> 
> Only semi-noticeable remark about the exorbitant amount of time is a bullet point in:
http://wiki.apache.org/cassandra/VirtualNodes/Balance
> 
> "Shuffling will entail moving a lot of data around the cluster and so has the potential
to consume a lot of disk and network I/O, and to take a considerable amount of time. For this
to be an online operation, the shuffle will need to operate on a lower priority basis to other
streaming operations, and should be expected to take days or weeks to complete."
> 
> We tried running shuffle on a QA version of our cluster and 2 things were brought to
light:
>  - Even with no reads/writes it was going to take 20 days
>  - Each machine needed enough free diskspace to potentially hold the entire cluster's
sstables on disk
> 
> Regards,
> 
> John


Mime
View raw message