Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of michalm@opera.com designates
 74.125.82.173 as permitted sender)
Message-ID: <51D55E3C.2030103@opera.com>
Date: Thu, 04 Jul 2013 13:36:28 +0200
From: =?UTF-8?B?TWljaGHFgiBNaWNoYWxza2k=?= <michalm@opera.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130329 Thunderbird/17.0.5
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: going down from RF=3 to RF=2, repair constantly falls over with
 JVM OOM
References: 
 <CAOe9oG5Kmdwd4tusBzpB6pMaSWnTgejK2znjnc4qtG9oVizNOg@mail.gmail.com>
In-Reply-To: 
 <CAOe9oG5Kmdwd4tusBzpB6pMaSWnTgejK2znjnc4qtG9oVizNOg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

I don't think you need to run repair if you decrease RF. At least I 
wouldn't do it.

In case of *decreasing* RF have 3 nodes containing some data, but only 2 
of them should store them from now on, so you should rather run cleanup, 
instead of repair, toget rid of the data on 3rd replica. And I guess it 
should work (in terms of disk space and memory), if you've been able to 
perform compaction.

Repair makes sense if you *increase* RF, so the data are streamed to the 
new replicas.

M.


W dniu 04.07.2013 12:20, Evan Dandrea pisze:
> Hi,
>
> We've made the mistake of letting our nodes get too large, now holding
> about 3TB each. We ran out of enough free space to have a successful
> compaction, and because we're on 1.0.7, enabling compression to get
> out of the mess wasn't feasible. We tried adding another node, but we
> think this may have put too much pressure on the existing ones it was
> replicating from, so we backed out.
>
> So we decided to drop RF down to 2 from 3 to relieve the disk pressure
> and started building a secondary cluster with lots of 1 TB nodes. We
> ran repair -pr on each node, but it’s failing with a JVM OOM on one
> node while another node is streaming from it for the final repair.
>
> Does anyone know what we can tune to get the cluster stable enough to
> put it in a multi-dc setup with the secondary cluster? Do we actually
> need to wait for these RF3->RF2 repairs to stabilize, or could we
> point it at the secondary cluster without worry of data loss?
>
> We’ve set the heap on these two problematic nodes to 20GB, up from the
> equally too high 12GB, but we’re still hitting OOM. I had seen in
> other threads that tuning down compaction might help, so we’re trying
> the following:
>
> in_memory_compaction_limit_in_mb 32 (down from 64)
> compaction_throughput_mb_per_sec 8 (down from 16)
> concurrent_compactors 2 (the nodes have 24 cores)
> flush_largest_memtables_at 0.45 (down from 0.50)
> stream_throughput_outbound_megabits_per_sec 300 (down from 400)
> reduce_cache_sizes_at 0.5 (down from 0.6)
> reduce_cache_capacity_to 0.35 (down from 0.4)
>
> -XX:CMSInitiatingOccupancyFraction=30
>
> Here’s the log from the most recent repair failure:
>
> http://paste.ubuntu.com/5843017/
>
> The OOM starts at line 13401.
>
> Thanks for whatever insight you can provide.
>