zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: ZK 3.4.5: Very Strange Write Latency Problem?
Date Mon, 24 Feb 2014 20:34:50 GMT
try CMS garbage collector and see if it improves. I think you are great at
debugging, being new to JAVA and ZK, you were able to correlate GC activity
with latency spikes. Kudos for that.

Try the following JVM Flags.

-server -Xms<> -Xmx<> -XX:NewSize=<> -XX:MaxNewSize=<>
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70

If you use disk as backing store, i dont think you can get a consistent
read/write of 5ms. There are lot of limitations in the design (most of them
are there to ensure consistency, for example every writes ensure that
transaction log is fsynced before acknowledging to the client).

RAM disk might give your performance but you need to be prepared for the
catastrophic scenario where all zookeepers go down.

thanks,
Kishore G



On Mon, Feb 24, 2014 at 9:36 AM, jmmec <jmmec2009@gmail.com> wrote:

> Hey everyone,
>
> Did I mention that I'm a newbie to ZooKeeper and also to JAVA?   :)
>
> I enabled some JAVA GC logs via the "java.env" file:
>
> export JVMFLAGS="-Xms1024m -Xmx1024m -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime"
>
> and confirmed that the periodic latency is due to JAVA GC operations.
>
> For example, below is a 26ms delay which corresponds to a 26ms delay that
> my test app also saw (it uses the C API and connects to ZK remotely) and as
> also reported by ZK which is the only JAVA app running in the ZK cluster:
>
> 2014-02-24T10:29:51.905-0600: [GC [PSYoungGen: 275424K->12128K(305152K)]
> 325542K->73974K(1004544K), 0.0255720 secs] [Times: user=0.09 sys=0.00,
> real=0.03 secs]
> 2014-02-24T10:29:51.931-0600: Total time for which application threads were
> stopped: 0.0261350 seconds
>
> JAVA JVM tuning seems to be more of a black art than a science with respect
> to GC and other settings.  I was wondering if anyone has any practical
> advice for JVM settings for the following configuration:
>
> a) ZK 3-node cluster running OpenJDK 1.7; ZK is the only app running JAVA.
> b) Application znode data and watches will fit into < 100MB of RAM (say
> 250k znodes with ~150 bytes per znode with 2 watchers per znode)
>
> Consistent and fast read / write latency - say 5ms or less - is critical
> for the small dataset above.  I'm trying to understand if this is
> obtainable with ZK & JAVA.  I realize that other factors come into play as
> well (hardware / network).
>
> Thanks in advance for any advice.
>
>
> On Fri, Feb 21, 2014 at 7:51 AM, jmmec <jmmec2009@gmail.com> wrote:
>
> > Thanks Camille, I definitely understand!  :)
> >
> > The two questions at the top of mind regarding ZooKeeper are:
> > 1. How does it calculate latencies?  I can dig into its code to see.
> > 2. Is there anything in particular that might cause it to have the spiky
> > latency I've experienced?  I think I ruled out the snapshot behavior by
> > having a high snapCount.
> >
> > Some other things I am planning to explore:
> > 1. My test software is rightfully suspect, so I'll review it carefully
> > again and will simplify it further so that it is doing the absolute bare
> > minimum.
> > 2. I'm running OpenJDK 1.7.0_60-ea so might swap to an earlier and/or
> > different distribution.
> > 3. I'm running ZooKeeper 3.4.5 and might fall back to the 3.3.6 release.
> >
> > Hopefully one of the items above will reveal the root cause.  Any other
> > suggestions are welcome.
> >
> >
> >
> > On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier <camille@apache.org
> >wrote:
> >
> >> I might suggest that you create a personal github and mock up a
> >> replication
> >> there :) I understand employers that own your code but unless someone
> >> knows
> >> the answer off the top of their head, odds of finding the cause are low
> >> without something that replicates it, and knowing how busy most of us
> are
> >> here I don't know that we'll have time to do that for you.
> >>
> >> C
> >>
> >>
> >> On Thu, Feb 20, 2014 at 9:41 PM, jmmec <jmmec2009@gmail.com> wrote:
> >>
> >> > Thanks again,
> >> >
> >> > Unfortunately I can't share the test code since it is technically the
> >> > property of my employer.
> >> >
> >> > It's very strange behavior.  I think I've said that several times now.
> >> > ha...
> >> >
> >> > Appreciate any additional help or advice or suggestions from everyone
> >> and
> >> > anyone and their brother or sister.
> >> >
> >> >
> >> >
> >> > On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier <camille@apache.org
> >> > >wrote:
> >> >
> >> > > Can you share the test code somewhere (github maybe?)?
> >> > >
> >> > > Thanks,
> >> > > C
> >> > >
> >> > >
> >> > > On Thu, Feb 20, 2014 at 9:08 PM, jmmec <jmmec2009@gmail.com>
wrote:
> >> > >
> >> > > > Thanks for the quick reply.
> >> > > >
> >> > > > I did not try the "slow" test using a normal disk drive, however
I
> >> > first
> >> > > > discovered this problem when writing to a 7200RPM disk drive
at a
> >> much
> >> > > > higher messaging rate (e.g. 1500 to 3000 creates/sec rather than
> 84
> >> > > > creates/sec).  This is what caused me to start simplifying the
> >> > > > configuration trying to find the root cause.  As part of that
> >> > > > investigation, I created a RAM disk to avoid the hard drive,
but
> the
> >> > hard
> >> > > > drive wasn't the problem.  I just haven't switched back to the
> hard
> >> > > drive.
> >> > > >
> >> > > > I don't know what ZooKeeper is doing internally, or how &
why it
> is
> >> > > > deriving 76ms MAX latency.  The very regular periodic pattern
> >> suggests
> >> > > > something odd.
> >> > > >
> >> > > > Hmmmm.....
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message