cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime <maxim...@gmail.com>
Subject Re: OOM at Bootstrap Time
Date Sun, 26 Oct 2014 02:15:52 GMT
Thanks a lot that is comforting. We are also small at the moment so I
definitely can relate with the idea of keeping small and simple at a level
where it just works.

I see the new Apache version has a lot of fixes so I will try to upgrade
before I look into downgrading.

On Saturday, October 25, 2014, Laing, Michael <michael.laing@nytimes.com>
wrote:

> Since no one else has stepped in...
>
> We have run clusters with ridiculously small nodes - I have a production
> cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
> storage. It works fine but you can see those little puppies struggle...
>
> And I ran into problems such as you observe...
>
> Upgrading Java to the latest 1.7 and - most importantly - *reverting to
> the default configuration, esp. for heap*, seemed to settle things down
> completely. Also make sure that you are using the 'recommended production
> settings' from the docs on your boxen.
>
> However we are running 2.0.x not 2.1.0 so YMMV.
>
> And we are switching to 15GB nodes w 2 heftier CPUs each and SSD storage -
> still a 'small' machine, but much more reasonable for C*.
>
> However I can't say I am an expert, since I deliberately keep things so
> simple that we do not encounter problems - it just works so I dig into
> other stuff.
>
> ml
>
>
> On Sat, Oct 25, 2014 at 5:22 PM, Maxime <maximelb@gmail.com
> <javascript:_e(%7B%7D,'cvml','maximelb@gmail.com');>> wrote:
>
>> Hello, I've been trying to add a new node to my cluster ( 4 nodes ) for a
>> few days now.
>>
>> I started by adding a node similar to my current configuration, 4 GB or
>> RAM + 2 Cores on DigitalOcean. However every time, I would end up getting
>> OOM errors after many log entries of the type:
>>
>> INFO  [SlabPoolCleaner] 2014-10-25 13:44:57,240
>> ColumnFamilyStore.java:856 - Enqueuing flush of mycf: 5383 (0%) on-heap, 0
>> (0%) off-heap
>>
>> leading to:
>>
>> ka-120-Data.db (39291 bytes) for commitlog position
>> ReplayPosition(segmentId=1414243978538, position=23699418)
>> WARN  [SharedPool-Worker-13] 2014-10-25 13:48:18,032
>> AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread
>> Thread[SharedPool-Worker-13,5,main]: {}
>> java.lang.OutOfMemoryError: Java heap space
>>
>> Thinking it had to do with either compaction somehow or streaming, 2
>> activities I've had tremendous issues with in the past; I tried to slow
>> down the setstreamthroughput to extremely low values all the way to 5. I
>> also tried setting setcompactionthoughput to 0, and then reading that in
>> some cases it might be too fast, down to 8. Nothing worked, it merely
>> vaguely changed the mean time to OOM but not in a way indicating either was
>> anywhere a solution.
>>
>> The nodes were configured with 2 GB of Heap initially, I tried to crank
>> it up to 3 GB, stressing the host memory to its limit.
>>
>> After doing some exploration (I am considering writing a Cassandra Ops
>> documentation with lessons learned since there seems to be little of it in
>> organized fashions), I read that some people had strange issues on
>> lower-end boxes like that, so I bit the bullet and upgraded my new node to
>> a 8GB + 4 Core instance, which was anecdotally better.
>>
>> To my complete shock, exact same issues are present, even raising the
>> Heap memory to 6 GB. I figure it can't be a "normal" situation anymore, but
>> must be a bug somehow.
>>
>> My cluster is 4 nodes, RF of 2, about 160 GB of data across all nodes.
>> About 10 CF of varying sizes. Runtime writes are between 300 to 900 /
>> second. Cassandra 2.1.0, nothing too wild.
>>
>> Has anyone encountered these kinds of issues before? I would really enjoy
>> hearing about the experiences of people trying to run small-sized clusters
>> like mine. From everything I read, Cassandra operations go very well on
>> large (16 GB + 8 Cores) machines, but I'm sad to report I've had nothing
>> but trouble trying to run on smaller machines, perhaps I can learn from
>> other's experience?
>>
>> Full logs can be provided to anyone interested.
>>
>> Cheers
>>
>
>

Mime
View raw message