From Eric Evans <>
Subject Re: Problems with shuffle
Date Mon, 08 Apr 2013 14:00:01 GMT
[ Rustam Aliyev ]
> Hi,
> After upgrading to the vnodes I created and enabled shuffle
> operation as suggested. After running for a couple of hours I had to
> disable it because nodes were not catching up with compactions. I
> repeated this process 3 times (enable/disable).
> I have 5 nodes and each of them had ~35GB. After shuffle operations
> described above some nodes are now reaching ~170GB. In the log files
> I can see same files transferred 2-4 times to the same host within
> the same shuffle session. Worst of all, after all of these I had
> only 20 vnodes transferred out of 1280. So if it will continue at
> the same speed it will take about a month or two to complete
> shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.

> I had few question to better understand shuffle:
> 1. Does disabling and re-enabling shuffle starts shuffle process from
>    scratch or it resumes from the last point?

It resumes.

> 2. Will vnode reallocations speedup as shuffle proceeds or it will
>    remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.

> 3. Why I see multiple transfers of the same file to the same host? e.g.:
>    INFO [Streaming to /] 2013-04-07 14:27:10,038
> (line 44) Successfully sent
>    /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>    to /
>    INFO [Streaming to /] 2013-04-07 16:27:07,427
> (line 44) Successfully sent
>    /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>    to /

I'm not sure, but perhaps that file contained data for two different

> 4. When I enable/disable shuffle I receive warning message such as
>    below. Do I need to worry about it?
>    cassandra-shuffle -h localhost disable
>    Failed to enable shuffling on!
>    Failed to enable shuffling on!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?

As a rule of thumb though, you don't want an disable/enable to result in
only a subset of nodes shuffling.  Are there no other errors?  What do
the logs say?

> I couldn't find many docs on shuffle, only read through JIRA and
> original proposal by Eric.

Eric Evans

