incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: StackOverflowError on high load
Date Sat, 20 Feb 2010 04:27:38 GMT
looks like test1 started gc storming, so test2 treats it as dead and
starts doing hinted handoff for it, which increases test2's load, even
though test1 is not completely dead yet.

On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory <rantav@gmail.com> wrote:
> I found another interesting graph, attached.
> I looked at the write-count and write-latency of the CF I'm writing to and I
> see a few interesting things:
> 1. the host test2 crashed at 18:00
> 2. At 16:00, after a few hours of load both hosts dropped their write-count.
> test1 (which did not crash) started slowing down first and then test2
> slowed.
> 3. At 16:00 I start seeing high write-latency on test2 only. This takes
> about 2h until finally at 18:00 it crashes.
> Does this help?
>
> On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory <rantav@gmail.com> wrote:
>>
>> I ran the process again and after a few hours the same node crashed the
>> same way. Now I can tell for sure this is indeed what Jonathan proposed -
>> the data directory needs to be 2x of what it is, but it looks like a design
>> problem, how large to I need to tell my admin to set it then?
>> Here's what I see when the server crashes:
>> $ df -h /outbrain/cassandra/data/
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/mapper/cassandra-data
>>                        97G   46G   47G  50% /outbrain/cassandra/data
>> The directory is 97G and when the host crashes it's at 50% use.
>> I'm also monitoring various JMX counters and I see that COMPACTION-POOL
>> PendingTasks grows for a while on this host (not on the other host, btw,
>> which is fine, just this host) and then flats for 3 hours. After 3 hours of
>> flat it crashes. I'm attaching the graph.
>> When I restart cassandra on this host (not changed file allocation size,
>> just restart) it does manage to compact the data files pretty fast, so after
>> a minute I get 12% use, so I wonder what made it crash before that doesn't
>> now? (could be the load that's not running now)
>> $ df -h /outbrain/cassandra/data/
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/mapper/cassandra-data
>>                        97G   11G   82G  12% /outbrain/cassandra/data
>> The question is what size does the data directory need to be? It's not 2x
>> the size of the data I expect to have (I only have 11G of real data after
>> compaction and the dir is 97G, so it should have been enough). If it's 2x of
>> something dynamic that keeps growing and isn't bound then it'll just
>> grow infinitely, right? What's the bound?
>> Alternatively, what jmx counter thresholds are the best indicators for the
>> crash that's about to happen?
>> Thanks
>>
>> On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta <tsaloranta@gmail.com>
>> wrote:
>>>
>>> On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory <rantav@gmail.com> wrote:
>>> > If it's the data directory, then I have a pretty big one. Maybe it's
>>> > something else
>>> > $ df -h /outbrain/cassandra/data/
>>> > Filesystem            Size  Used Avail Use% Mounted on
>>> > /dev/mapper/cassandra-data
>>> >                        97G   11G   82G  12% /outbrain/cassandra/data
>>>
>>> Perhaps a temporary file? JVM defaults to /tmp, which may be on a
>>> smaller (root) partition?
>>>
>>> -+ Tatu +-
>>
>
>

Mime
View raw message