incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ran Tavory <ran...@gmail.com>
Subject Re: StackOverflowError on high load
Date Sun, 21 Feb 2010 19:41:21 GMT
That's big help, thanks Stu!

On Sun, Feb 21, 2010 at 9:29 PM, Stu Hood <stu.hood@rackspace.com> wrote:

> Ran,
>
> There are bounds to how large your data directory will grow, relative to
> the actual data. Please read up on compaction:
> http://wiki.apache.org/cassandra/MemtableSSTable , and if you have a
> significant number of deletes occuring, also read
> http://wiki.apache.org/cassandra/DistributedDeletes
>
> The key mitigation is to ensure that minor compactions get a chance to
> occur regularly. This will happen automatically, but the faster you write
> data to your nodes, the more behind on compactions they can get. We consider
> this a bug, and CASSANDRA-685 will be exploring solutions so that your
> client automatically backs off as a node becomes overloaded.
>
> Thanks,
> Stu
>
> -----Original Message-----
> From: "Ran Tavory" <rantav@gmail.com>
> Sent: Sunday, February 21, 2010 9:01am
> To: cassandra-user@incubator.apache.org
> Subject: Re: StackOverflowError on high load
>
> This sort of explain this, yes, but what solution can I use?
> I do see the OPP writes go faster than the RP, so this makes sense that
> when
> using the OPP there's higher chance that a host will fall behind with
> compaction and eventually crash. It's not a nice feature, but hopefully
> there are mitigations to this.
> So my question is - what are the mitigations? What should I tell my admin
> to
> do in order to prevent this? Telling him "increase the directory size 2x"
> isn't going to cut it as the directory just keeps growing and is not
> bound...
> I'm also no clear whether CASSANDRA-804 is going to be a real fix.
> Thanks
>
> On Sat, Feb 20, 2010 at 9:36 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
> > if OPP is configured w/ imbalanced ranges (or less balanced than RP)
> > then that would explain it.
> >
> > OPP is actually slightly faster in terms of raw speed.
> >
> > On Sat, Feb 20, 2010 at 2:31 PM, Ran Tavory <rantav@gmail.com> wrote:
> > > interestingly, I ran the same load but this time with a random
> > partitioner
> > > and, although from time to time test2 was a little behind with its
> > > compaction task, it did not crash and was able to eventually close the
> > gaps
> > > that were opened.
> > > Does this make sense? Is there a reason why random partitioner is less
> > > likely to be faulty in this scenario? The scenario is of about 1300
> > > writes/sec of small amounts of data to a single CF on a cluster with
> two
> > > nodes and no replication. With the order-preserving-partitioner after a
> > few
> > > hours of load the compaction pool is behind on one of the hosts and
> > > eventually this host crashes, but with the random partitioner it
> doesn't
> > > crash.
> > > thanks
> > >
> > > On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis <jbellis@gmail.com>
> > wrote:
> > >>
> > >> looks like test1 started gc storming, so test2 treats it as dead and
> > >> starts doing hinted handoff for it, which increases test2's load, even
> > >> though test1 is not completely dead yet.
> > >>
> > >> On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory <rantav@gmail.com> wrote:
> > >> > I found another interesting graph, attached.
> > >> > I looked at the write-count and write-latency of the CF I'm writing
> to
> > >> > and I
> > >> > see a few interesting things:
> > >> > 1. the host test2 crashed at 18:00
> > >> > 2. At 16:00, after a few hours of load both hosts dropped their
> > >> > write-count.
> > >> > test1 (which did not crash) started slowing down first and then
> test2
> > >> > slowed.
> > >> > 3. At 16:00 I start seeing high write-latency on test2 only. This
> > takes
> > >> > about 2h until finally at 18:00 it crashes.
> > >> > Does this help?
> > >> >
> > >> > On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory <rantav@gmail.com>
> wrote:
> > >> >>
> > >> >> I ran the process again and after a few hours the same node crashed
> > the
> > >> >> same way. Now I can tell for sure this is indeed what Jonathan
> > proposed
> > >> >> -
> > >> >> the data directory needs to be 2x of what it is, but it looks
like
> a
> > >> >> design
> > >> >> problem, how large to I need to tell my admin to set it then?
> > >> >> Here's what I see when the server crashes:
> > >> >> $ df -h /outbrain/cassandra/data/
> > >> >> Filesystem            Size  Used Avail Use% Mounted on
> > >> >> /dev/mapper/cassandra-data
> > >> >>                        97G   46G   47G  50%
> /outbrain/cassandra/data
> > >> >> The directory is 97G and when the host crashes it's at 50% use.
> > >> >> I'm also monitoring various JMX counters and I see that
> > COMPACTION-POOL
> > >> >> PendingTasks grows for a while on this host (not on the other
host,
> > >> >> btw,
> > >> >> which is fine, just this host) and then flats for 3 hours. After
3
> > >> >> hours of
> > >> >> flat it crashes. I'm attaching the graph.
> > >> >> When I restart cassandra on this host (not changed file allocation
> > >> >> size,
> > >> >> just restart) it does manage to compact the data files pretty
fast,
> > so
> > >> >> after
> > >> >> a minute I get 12% use, so I wonder what made it crash before
that
> > >> >> doesn't
> > >> >> now? (could be the load that's not running now)
> > >> >> $ df -h /outbrain/cassandra/data/
> > >> >> Filesystem            Size  Used Avail Use% Mounted on
> > >> >> /dev/mapper/cassandra-data
> > >> >>                        97G   11G   82G  12%
> /outbrain/cassandra/data
> > >> >> The question is what size does the data directory need to be?
It's
> > not
> > >> >> 2x
> > >> >> the size of the data I expect to have (I only have 11G of real
data
> > >> >> after
> > >> >> compaction and the dir is 97G, so it should have been enough).
If
> > it's
> > >> >> 2x of
> > >> >> something dynamic that keeps growing and isn't bound then it'll
> just
> > >> >> grow infinitely, right? What's the bound?
> > >> >> Alternatively, what jmx counter thresholds are the best indicators
> > for
> > >> >> the
> > >> >> crash that's about to happen?
> > >> >> Thanks
> > >> >>
> > >> >> On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta <
> > tsaloranta@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory <rantav@gmail.com>
> > wrote:
> > >> >>> > If it's the data directory, then I have a pretty big
one. Maybe
> > it's
> > >> >>> > something else
> > >> >>> > $ df -h /outbrain/cassandra/data/
> > >> >>> > Filesystem            Size  Used Avail Use% Mounted on
> > >> >>> > /dev/mapper/cassandra-data
> > >> >>> >                        97G   11G   82G  12%
> > /outbrain/cassandra/data
> > >> >>>
> > >> >>> Perhaps a temporary file? JVM defaults to /tmp, which may
be on a
> > >> >>> smaller (root) partition?
> > >> >>>
> > >> >>> -+ Tatu +-
> > >> >>
> > >> >
> > >> >
> > >
> > >
> >
>
>
>

Mime
View raw message