incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Watson <j...@disqus.com>
Subject cassandra-shuffle time to completion and required disk space
Date Sun, 28 Apr 2013 21:21:20 GMT
The amount of time/space cassandra-shuffle requires when upgrading to using
vnodes should really be apparent in documentation (when some is made).

Only semi-noticeable remark about the exorbitant amount of time is a bullet
point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance

"Shuffling will entail moving a lot of data around the cluster and so has
the potential to consume a lot of disk and network I/O, and to take a
considerable amount of time. For this to be an online operation, the
shuffle will need to operate on a lower priority basis to other streaming
operations, and should be expected to take days or weeks to complete."

We tried running shuffle on a QA version of our cluster and 2 things were
brought to light:
 - Even with no reads/writes it was going to take 20 days
 - Each machine needed enough free diskspace to potentially hold the entire
cluster's sstables on disk

Regards,

John

Mime
View raw message