incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Observation on shuffling vs adding/removing nodes
Date Sun, 24 Mar 2013 17:14:51 GMT
> We initially tried to run a shuffle, however it seemed to be going really slowly (very
little progress by watching "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in
logs),
My guess is that shuffle not designed to be as efficient as possible as it is only used once.
Was it continuing to make progress?

> so we cancelled it and instead added 3 nodes to the cluster, waited for them to bootstrap,
and then decommissioned the first 3 nodes. 
You added 3 nodes with num_tokens set in the yaml file ?
What does nodetool status say ? 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 9:41 AM, Andrew Bialecki <andrew.bialecki@gmail.com> wrote:

> Just curious if anyone has any thoughts on something we've observed in a small test cluster.
> 
> We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to start using vnodes.
We upgraded the cluster to 1.2.2 and then followed the instructions for using vnodes. We initially
tried to run a shuffle, however it seemed to be going really slowly (very little progress
by watching "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in logs), so we cancelled
it and instead added 3 nodes to the cluster, waited for them to bootstrap, and then decommissioned
the first 3 nodes. Total process took about 3 hours. My assumption is that the final result
is the same in terms of data distributed somewhat randomly across nodes now (assuming no bias
in the token ranges selected when bootstrapping a node).
> 
> If that assumption is correct, the observation would be, if possible, adding nodes and
then removing nodes appears to be a faster way to shuffle data for small clusters. Obviously
not always possible, but I thought I'd just throw this out there in case anyone runs into
a similar situation. This cluster is unsurprisingly on EC2 instances, which made provisioning
and shutting down nodes extremely easy.
> 
> Cheers,
> Andrew


Mime
View raw message