zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: ZK 3.4.5: Very Strange Write Latency Problem?
Date Mon, 24 Feb 2014 22:13:35 GMT
You can try CMS. I don't think gc should be causing you pauses unless it's
actually cleaning old gen, eden GC should be pauseless. You can tune the
pool sizes to have enough space in old gen so you won't need pauses. The
log you have printed above seems to be indicating the pauseless GC so I'm
surprised it is causing noticeable performance degredation. But GC
ergonomics aren't that hard to manage, especially in an application that
should be used within existing process memory. Have you tried running this
on an oracle JDK instead of OpenJDK?

C


On Mon, Feb 24, 2014 at 3:34 PM, kishore g <g.kishore@gmail.com> wrote:

> try CMS garbage collector and see if it improves. I think you are great at
> debugging, being new to JAVA and ZK, you were able to correlate GC activity
> with latency spikes. Kudos for that.
>
> Try the following JVM Flags.
>
> -server -Xms<> -Xmx<> -XX:NewSize=<> -XX:MaxNewSize=<>
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
>
> If you use disk as backing store, i dont think you can get a consistent
> read/write of 5ms. There are lot of limitations in the design (most of them
> are there to ensure consistency, for example every writes ensure that
> transaction log is fsynced before acknowledging to the client).
>
> RAM disk might give your performance but you need to be prepared for the
> catastrophic scenario where all zookeepers go down.
>
> thanks,
> Kishore G
>
>
>
> On Mon, Feb 24, 2014 at 9:36 AM, jmmec <jmmec2009@gmail.com> wrote:
>
> > Hey everyone,
> >
> > Did I mention that I'm a newbie to ZooKeeper and also to JAVA?   :)
> >
> > I enabled some JAVA GC logs via the "java.env" file:
> >
> > export JVMFLAGS="-Xms1024m -Xmx1024m -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime"
> >
> > and confirmed that the periodic latency is due to JAVA GC operations.
> >
> > For example, below is a 26ms delay which corresponds to a 26ms delay that
> > my test app also saw (it uses the C API and connects to ZK remotely) and
> as
> > also reported by ZK which is the only JAVA app running in the ZK cluster:
> >
> > 2014-02-24T10:29:51.905-0600: [GC [PSYoungGen: 275424K->12128K(305152K)]
> > 325542K->73974K(1004544K), 0.0255720 secs] [Times: user=0.09 sys=0.00,
> > real=0.03 secs]
> > 2014-02-24T10:29:51.931-0600: Total time for which application threads
> were
> > stopped: 0.0261350 seconds
> >
> > JAVA JVM tuning seems to be more of a black art than a science with
> respect
> > to GC and other settings.  I was wondering if anyone has any practical
> > advice for JVM settings for the following configuration:
> >
> > a) ZK 3-node cluster running OpenJDK 1.7; ZK is the only app running
> JAVA.
> > b) Application znode data and watches will fit into < 100MB of RAM (say
> > 250k znodes with ~150 bytes per znode with 2 watchers per znode)
> >
> > Consistent and fast read / write latency - say 5ms or less - is critical
> > for the small dataset above.  I'm trying to understand if this is
> > obtainable with ZK & JAVA.  I realize that other factors come into play
> as
> > well (hardware / network).
> >
> > Thanks in advance for any advice.
> >
> >
> > On Fri, Feb 21, 2014 at 7:51 AM, jmmec <jmmec2009@gmail.com> wrote:
> >
> > > Thanks Camille, I definitely understand!  :)
> > >
> > > The two questions at the top of mind regarding ZooKeeper are:
> > > 1. How does it calculate latencies?  I can dig into its code to see.
> > > 2. Is there anything in particular that might cause it to have the
> spiky
> > > latency I've experienced?  I think I ruled out the snapshot behavior by
> > > having a high snapCount.
> > >
> > > Some other things I am planning to explore:
> > > 1. My test software is rightfully suspect, so I'll review it carefully
> > > again and will simplify it further so that it is doing the absolute
> bare
> > > minimum.
> > > 2. I'm running OpenJDK 1.7.0_60-ea so might swap to an earlier and/or
> > > different distribution.
> > > 3. I'm running ZooKeeper 3.4.5 and might fall back to the 3.3.6
> release.
> > >
> > > Hopefully one of the items above will reveal the root cause.  Any other
> > > suggestions are welcome.
> > >
> > >
> > >
> > > On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier <camille@apache.org
> > >wrote:
> > >
> > >> I might suggest that you create a personal github and mock up a
> > >> replication
> > >> there :) I understand employers that own your code but unless someone
> > >> knows
> > >> the answer off the top of their head, odds of finding the cause are
> low
> > >> without something that replicates it, and knowing how busy most of us
> > are
> > >> here I don't know that we'll have time to do that for you.
> > >>
> > >> C
> > >>
> > >>
> > >> On Thu, Feb 20, 2014 at 9:41 PM, jmmec <jmmec2009@gmail.com> wrote:
> > >>
> > >> > Thanks again,
> > >> >
> > >> > Unfortunately I can't share the test code since it is technically
> the
> > >> > property of my employer.
> > >> >
> > >> > It's very strange behavior.  I think I've said that several times
> now.
> > >> > ha...
> > >> >
> > >> > Appreciate any additional help or advice or suggestions from
> everyone
> > >> and
> > >> > anyone and their brother or sister.
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier <
> camille@apache.org
> > >> > >wrote:
> > >> >
> > >> > > Can you share the test code somewhere (github maybe?)?
> > >> > >
> > >> > > Thanks,
> > >> > > C
> > >> > >
> > >> > >
> > >> > > On Thu, Feb 20, 2014 at 9:08 PM, jmmec <jmmec2009@gmail.com>
> wrote:
> > >> > >
> > >> > > > Thanks for the quick reply.
> > >> > > >
> > >> > > > I did not try the "slow" test using a normal disk drive,
> however I
> > >> > first
> > >> > > > discovered this problem when writing to a 7200RPM disk drive
at
> a
> > >> much
> > >> > > > higher messaging rate (e.g. 1500 to 3000 creates/sec rather
than
> > 84
> > >> > > > creates/sec).  This is what caused me to start simplifying
the
> > >> > > > configuration trying to find the root cause.  As part of
that
> > >> > > > investigation, I created a RAM disk to avoid the hard drive,
but
> > the
> > >> > hard
> > >> > > > drive wasn't the problem.  I just haven't switched back
to the
> > hard
> > >> > > drive.
> > >> > > >
> > >> > > > I don't know what ZooKeeper is doing internally, or how
& why it
> > is
> > >> > > > deriving 76ms MAX latency.  The very regular periodic pattern
> > >> suggests
> > >> > > > something odd.
> > >> > > >
> > >> > > > Hmmmm.....
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message