incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: StackOverflowError on high load
Date Sat, 20 Feb 2010 19:36:39 GMT
if OPP is configured w/ imbalanced ranges (or less balanced than RP)
then that would explain it.

OPP is actually slightly faster in terms of raw speed.

On Sat, Feb 20, 2010 at 2:31 PM, Ran Tavory <rantav@gmail.com> wrote:
> interestingly, I ran the same load but this time with a random partitioner
> and, although from time to time test2 was a little behind with its
> compaction task, it did not crash and was able to eventually close the gaps
> that were opened.
> Does this make sense? Is there a reason why random partitioner is less
> likely to be faulty in this scenario? The scenario is of about 1300
> writes/sec of small amounts of data to a single CF on a cluster with two
> nodes and no replication. With the order-preserving-partitioner after a few
> hours of load the compaction pool is behind on one of the hosts and
> eventually this host crashes, but with the random partitioner it doesn't
> crash.
> thanks
>
> On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> looks like test1 started gc storming, so test2 treats it as dead and
>> starts doing hinted handoff for it, which increases test2's load, even
>> though test1 is not completely dead yet.
>>
>> On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory <rantav@gmail.com> wrote:
>> > I found another interesting graph, attached.
>> > I looked at the write-count and write-latency of the CF I'm writing to
>> > and I
>> > see a few interesting things:
>> > 1. the host test2 crashed at 18:00
>> > 2. At 16:00, after a few hours of load both hosts dropped their
>> > write-count.
>> > test1 (which did not crash) started slowing down first and then test2
>> > slowed.
>> > 3. At 16:00 I start seeing high write-latency on test2 only. This takes
>> > about 2h until finally at 18:00 it crashes.
>> > Does this help?
>> >
>> > On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory <rantav@gmail.com> wrote:
>> >>
>> >> I ran the process again and after a few hours the same node crashed the
>> >> same way. Now I can tell for sure this is indeed what Jonathan proposed
>> >> -
>> >> the data directory needs to be 2x of what it is, but it looks like a
>> >> design
>> >> problem, how large to I need to tell my admin to set it then?
>> >> Here's what I see when the server crashes:
>> >> $ df -h /outbrain/cassandra/data/
>> >> Filesystem            Size  Used Avail Use% Mounted on
>> >> /dev/mapper/cassandra-data
>> >>                        97G   46G   47G  50% /outbrain/cassandra/data
>> >> The directory is 97G and when the host crashes it's at 50% use.
>> >> I'm also monitoring various JMX counters and I see that COMPACTION-POOL
>> >> PendingTasks grows for a while on this host (not on the other host,
>> >> btw,
>> >> which is fine, just this host) and then flats for 3 hours. After 3
>> >> hours of
>> >> flat it crashes. I'm attaching the graph.
>> >> When I restart cassandra on this host (not changed file allocation
>> >> size,
>> >> just restart) it does manage to compact the data files pretty fast, so
>> >> after
>> >> a minute I get 12% use, so I wonder what made it crash before that
>> >> doesn't
>> >> now? (could be the load that's not running now)
>> >> $ df -h /outbrain/cassandra/data/
>> >> Filesystem            Size  Used Avail Use% Mounted on
>> >> /dev/mapper/cassandra-data
>> >>                        97G   11G   82G  12% /outbrain/cassandra/data
>> >> The question is what size does the data directory need to be? It's not
>> >> 2x
>> >> the size of the data I expect to have (I only have 11G of real data
>> >> after
>> >> compaction and the dir is 97G, so it should have been enough). If it's
>> >> 2x of
>> >> something dynamic that keeps growing and isn't bound then it'll just
>> >> grow infinitely, right? What's the bound?
>> >> Alternatively, what jmx counter thresholds are the best indicators for
>> >> the
>> >> crash that's about to happen?
>> >> Thanks
>> >>
>> >> On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta <tsaloranta@gmail.com>
>> >> wrote:
>> >>>
>> >>> On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory <rantav@gmail.com>
wrote:
>> >>> > If it's the data directory, then I have a pretty big one. Maybe
it's
>> >>> > something else
>> >>> > $ df -h /outbrain/cassandra/data/
>> >>> > Filesystem            Size  Used Avail Use% Mounted on
>> >>> > /dev/mapper/cassandra-data
>> >>> >                        97G   11G   82G  12% /outbrain/cassandra/data
>> >>>
>> >>> Perhaps a temporary file? JVM defaults to /tmp, which may be on a
>> >>> smaller (root) partition?
>> >>>
>> >>> -+ Tatu +-
>> >>
>> >
>> >
>
>

Mime
View raw message