incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dane Miller <d...@optimalsocial.com>
Subject Re: Stream fails during repair, two nodes out-of-memory
Date Sat, 23 Mar 2013 20:12:29 GMT
On Fri, Mar 22, 2013 at 5:58 PM, Wei Zhu <wz1975@yahoo.com> wrote:
> compaction needs some disk I/O. Slowing down our compaction will improve overall
> system performance. Of course, you don't want to go too slow and fall behind too much.

Hmm.  Even after making the suggested configuration changes, repair
still fails with OOM (but only one node died this time, which is an
improvement).  It looks like we hit OOM when repair starts streaming
multiple cfs simultaneously.  Just prior to OOM, the node loses
contact with another node in the cluster and starts storing hints.

I'm wondering if I should throttle streaming, and/or repair only one
CF at a time.

> From: "Dane Miller"
> Subject: Re: Stream fails during repair, two nodes out-of-memory
>
> On Thu, Mar 21, 2013 at 10:28 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> heap of 1867M is kind of small. According to the discussion on this list,
>> it's advisable to have m1.xlarge.
>>
>> +1
>>
>> In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the NEW_HEAP_SIZE to
>> 400M
>>
>> In the yaml file set
>>
>> in_memory_compaction_limit_in_mb to 32
>> compaction_throughput_mb_per_sec to 8
>> concurrent_compactors to 2
>>
>> This will slow down compaction a lot. You may want to restore some of these
>> settings once you have things stable.
>>
>> You have an under powered box for what you are trying to do.
>
> Thanks very much for the info.  Have made the changes and am retrying.
>  I'd like to understand, why does it help to slow compaction?
>
> It does seem like the cluster is under powered to handle our
> application's full write load plus repairs, but it operates fine
> otherwise.
>
> On Wed, Mar 20, 2013 at 8:47 PM, Wei Zhu <wz1975@yahoo.com> wrote:
>> It's clear you are out of memory. How big is your data size?
>
> 120 GB per node, of which 50% is actively written/updated, and 50% is
> read-mostly.
>
> Dane
>

Mime
View raw message