cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject Re: StackOverflowError on high load
Date Sun, 21 Feb 2010 19:29:53 GMT
Ran,

There are bounds to how large your data directory will grow, relative to the actual data.
Please read up on compaction: http://wiki.apache.org/cassandra/MemtableSSTable , and if you
have a significant number of deletes occuring, also read http://wiki.apache.org/cassandra/DistributedDeletes

The key mitigation is to ensure that minor compactions get a chance to occur regularly. This
will happen automatically, but the faster you write data to your nodes, the more behind on
compactions they can get. We consider this a bug, and CASSANDRA-685 will be exploring solutions
so that your client automatically backs off as a node becomes overloaded.

Thanks,
Stu

-----Original Message-----
From: "Ran Tavory" <rantav@gmail.com>
Sent: Sunday, February 21, 2010 9:01am
To: cassandra-user@incubator.apache.org
Subject: Re: StackOverflowError on high load

This sort of explain this, yes, but what solution can I use?
I do see the OPP writes go faster than the RP, so this makes sense that when
using the OPP there's higher chance that a host will fall behind with
compaction and eventually crash. It's not a nice feature, but hopefully
there are mitigations to this.
So my question is - what are the mitigations? What should I tell my admin to
do in order to prevent this? Telling him "increase the directory size 2x"
isn't going to cut it as the directory just keeps growing and is not
bound...
I'm also no clear whether CASSANDRA-804 is going to be a real fix.
Thanks

On Sat, Feb 20, 2010 at 9:36 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> if OPP is configured w/ imbalanced ranges (or less balanced than RP)
> then that would explain it.
>
> OPP is actually slightly faster in terms of raw speed.
>
> On Sat, Feb 20, 2010 at 2:31 PM, Ran Tavory <rantav@gmail.com> wrote:
> > interestingly, I ran the same load but this time with a random
> partitioner
> > and, although from time to time test2 was a little behind with its
> > compaction task, it did not crash and was able to eventually close the
> gaps
> > that were opened.
> > Does this make sense? Is there a reason why random partitioner is less
> > likely to be faulty in this scenario? The scenario is of about 1300
> > writes/sec of small amounts of data to a single CF on a cluster with two
> > nodes and no replication. With the order-preserving-partitioner after a
> few
> > hours of load the compaction pool is behind on one of the hosts and
> > eventually this host crashes, but with the random partitioner it doesn't
> > crash.
> > thanks
> >
> > On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> looks like test1 started gc storming, so test2 treats it as dead and
> >> starts doing hinted handoff for it, which increases test2's load, even
> >> though test1 is not completely dead yet.
> >>
> >> On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory <rantav@gmail.com> wrote:
> >> > I found another interesting graph, attached.
> >> > I looked at the write-count and write-latency of the CF I'm writing to
> >> > and I
> >> > see a few interesting things:
> >> > 1. the host test2 crashed at 18:00
> >> > 2. At 16:00, after a few hours of load both hosts dropped their
> >> > write-count.
> >> > test1 (which did not crash) started slowing down first and then test2
> >> > slowed.
> >> > 3. At 16:00 I start seeing high write-latency on test2 only. This
> takes
> >> > about 2h until finally at 18:00 it crashes.
> >> > Does this help?
> >> >
> >> > On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory <rantav@gmail.com> wrote:
> >> >>
> >> >> I ran the process again and after a few hours the same node crashed
> the
> >> >> same way. Now I can tell for sure this is indeed what Jonathan
> proposed
> >> >> -
> >> >> the data directory needs to be 2x of what it is, but it looks like
a
> >> >> design
> >> >> problem, how large to I need to tell my admin to set it then?
> >> >> Here's what I see when the server crashes:
> >> >> $ df -h /outbrain/cassandra/data/
> >> >> Filesystem            Size  Used Avail Use% Mounted on
> >> >> /dev/mapper/cassandra-data
> >> >>                        97G   46G   47G  50% /outbrain/cassandra/data
> >> >> The directory is 97G and when the host crashes it's at 50% use.
> >> >> I'm also monitoring various JMX counters and I see that
> COMPACTION-POOL
> >> >> PendingTasks grows for a while on this host (not on the other host,
> >> >> btw,
> >> >> which is fine, just this host) and then flats for 3 hours. After 3
> >> >> hours of
> >> >> flat it crashes. I'm attaching the graph.
> >> >> When I restart cassandra on this host (not changed file allocation
> >> >> size,
> >> >> just restart) it does manage to compact the data files pretty fast,
> so
> >> >> after
> >> >> a minute I get 12% use, so I wonder what made it crash before that
> >> >> doesn't
> >> >> now? (could be the load that's not running now)
> >> >> $ df -h /outbrain/cassandra/data/
> >> >> Filesystem            Size  Used Avail Use% Mounted on
> >> >> /dev/mapper/cassandra-data
> >> >>                        97G   11G   82G  12% /outbrain/cassandra/data
> >> >> The question is what size does the data directory need to be? It's
> not
> >> >> 2x
> >> >> the size of the data I expect to have (I only have 11G of real data
> >> >> after
> >> >> compaction and the dir is 97G, so it should have been enough). If
> it's
> >> >> 2x of
> >> >> something dynamic that keeps growing and isn't bound then it'll just
> >> >> grow infinitely, right? What's the bound?
> >> >> Alternatively, what jmx counter thresholds are the best indicators
> for
> >> >> the
> >> >> crash that's about to happen?
> >> >> Thanks
> >> >>
> >> >> On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta <
> tsaloranta@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory <rantav@gmail.com>
> wrote:
> >> >>> > If it's the data directory, then I have a pretty big one.
Maybe
> it's
> >> >>> > something else
> >> >>> > $ df -h /outbrain/cassandra/data/
> >> >>> > Filesystem            Size  Used Avail Use% Mounted on
> >> >>> > /dev/mapper/cassandra-data
> >> >>> >                        97G   11G   82G  12%
> /outbrain/cassandra/data
> >> >>>
> >> >>> Perhaps a temporary file? JVM defaults to /tmp, which may be on
a
> >> >>> smaller (root) partition?
> >> >>>
> >> >>> -+ Tatu +-
> >> >>
> >> >
> >> >
> >
> >
>



Mime
View raw message