zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jmmec <jmmec2...@gmail.com>
Subject Re: ZK 3.4.5: Very Strange Write Latency Problem?
Date Mon, 24 Feb 2014 17:36:31 GMT
Hey everyone,

Did I mention that I'm a newbie to ZooKeeper and also to JAVA?   :)

I enabled some JAVA GC logs via the "java.env" file:

export JVMFLAGS="-Xms1024m -Xmx1024m -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime"

and confirmed that the periodic latency is due to JAVA GC operations.

For example, below is a 26ms delay which corresponds to a 26ms delay that
my test app also saw (it uses the C API and connects to ZK remotely) and as
also reported by ZK which is the only JAVA app running in the ZK cluster:

2014-02-24T10:29:51.905-0600: [GC [PSYoungGen: 275424K->12128K(305152K)]
325542K->73974K(1004544K), 0.0255720 secs] [Times: user=0.09 sys=0.00,
real=0.03 secs]
2014-02-24T10:29:51.931-0600: Total time for which application threads were
stopped: 0.0261350 seconds

JAVA JVM tuning seems to be more of a black art than a science with respect
to GC and other settings.  I was wondering if anyone has any practical
advice for JVM settings for the following configuration:

a) ZK 3-node cluster running OpenJDK 1.7; ZK is the only app running JAVA.
b) Application znode data and watches will fit into < 100MB of RAM (say
250k znodes with ~150 bytes per znode with 2 watchers per znode)

Consistent and fast read / write latency - say 5ms or less - is critical
for the small dataset above.  I'm trying to understand if this is
obtainable with ZK & JAVA.  I realize that other factors come into play as
well (hardware / network).

Thanks in advance for any advice.


On Fri, Feb 21, 2014 at 7:51 AM, jmmec <jmmec2009@gmail.com> wrote:

> Thanks Camille, I definitely understand!  :)
>
> The two questions at the top of mind regarding ZooKeeper are:
> 1. How does it calculate latencies?  I can dig into its code to see.
> 2. Is there anything in particular that might cause it to have the spiky
> latency I've experienced?  I think I ruled out the snapshot behavior by
> having a high snapCount.
>
> Some other things I am planning to explore:
> 1. My test software is rightfully suspect, so I'll review it carefully
> again and will simplify it further so that it is doing the absolute bare
> minimum.
> 2. I'm running OpenJDK 1.7.0_60-ea so might swap to an earlier and/or
> different distribution.
> 3. I'm running ZooKeeper 3.4.5 and might fall back to the 3.3.6 release.
>
> Hopefully one of the items above will reveal the root cause.  Any other
> suggestions are welcome.
>
>
>
> On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier <camille@apache.org>wrote:
>
>> I might suggest that you create a personal github and mock up a
>> replication
>> there :) I understand employers that own your code but unless someone
>> knows
>> the answer off the top of their head, odds of finding the cause are low
>> without something that replicates it, and knowing how busy most of us are
>> here I don't know that we'll have time to do that for you.
>>
>> C
>>
>>
>> On Thu, Feb 20, 2014 at 9:41 PM, jmmec <jmmec2009@gmail.com> wrote:
>>
>> > Thanks again,
>> >
>> > Unfortunately I can't share the test code since it is technically the
>> > property of my employer.
>> >
>> > It's very strange behavior.  I think I've said that several times now.
>> > ha...
>> >
>> > Appreciate any additional help or advice or suggestions from everyone
>> and
>> > anyone and their brother or sister.
>> >
>> >
>> >
>> > On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier <camille@apache.org
>> > >wrote:
>> >
>> > > Can you share the test code somewhere (github maybe?)?
>> > >
>> > > Thanks,
>> > > C
>> > >
>> > >
>> > > On Thu, Feb 20, 2014 at 9:08 PM, jmmec <jmmec2009@gmail.com> wrote:
>> > >
>> > > > Thanks for the quick reply.
>> > > >
>> > > > I did not try the "slow" test using a normal disk drive, however I
>> > first
>> > > > discovered this problem when writing to a 7200RPM disk drive at a
>> much
>> > > > higher messaging rate (e.g. 1500 to 3000 creates/sec rather than 84
>> > > > creates/sec).  This is what caused me to start simplifying the
>> > > > configuration trying to find the root cause.  As part of that
>> > > > investigation, I created a RAM disk to avoid the hard drive, but the
>> > hard
>> > > > drive wasn't the problem.  I just haven't switched back to the hard
>> > > drive.
>> > > >
>> > > > I don't know what ZooKeeper is doing internally, or how & why
it is
>> > > > deriving 76ms MAX latency.  The very regular periodic pattern
>> suggests
>> > > > something odd.
>> > > >
>> > > > Hmmmm.....
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message