zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jmmec <jmmec2...@gmail.com>
Subject Re: ZK 3.4.5: Very Strange Write Latency Problem?
Date Tue, 25 Feb 2014 22:55:55 GMT
Thanks very much Kishore, Camille, and Patrick,

Unless I'm misunderstanding something, I'm becoming convinced that the MAX
latencies are due to JAVA GC and failing to tune the JVM appropriately.  My
email from yesterday showed one sample of 26ms latency that occurred during
JAVA GC, but I executed more test runs today and confirmed similar behavior.

For completeness, there are two areas below:

1. The first area shows latency results using different JVM settings
dealing with GC.  It looks like CMS GC is best for the particular test I'm
running.  I haven't yet looked into all the various options (additional
were suggested by Kishore), but maybe performance can be further optimized
once I better understand JAVA & learn about its profiling tools etc.

2. The second area is additional test data showing (confirming?) that GC is
a (the?) root cause. The data in the first area seems to confirm this as
well.

(Hope the stuff below is somewhat readable and that the formatting is
totally messed up.)



*===============================================================Area 1: JVM
GC Tests
===============================================================*

Below are some brute force tests with different JVM settings.  It seems
that limiting max heap tends to cause GC cycles to run faster, or maybe
not, so I'm limiting to 256MB since my application requires <100MB in
reality.

Configuration:
A. Single ZK Server (solo mode) using RAM disk with "snapCount=5000000"
(i.e. 5M) to ensure that snapshot files are not written during a test run
(none were).

B. Single Writer (running on same server) creates 100,000 znodes @
~1500/sec, then sleeps for 250ms, and then deletes everything @ ~1500/sec.
Repeat cycle two times for a total of three data samples.

C. Different JVMFLAGS settings are used as noted.

Below are the 'create' (C AVG / MAX) and 'delete' (D AVG / MAX) stats for
each test sample.

Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m"
- C: 3.7 / 39.0 ; D: 3.0 / 15.0
- C: 3.2 / 52.6 ; D: 3.8 / 186.7 (Full GC ran @ 167ms; normally just GC
runs)
- C: 3.7 / 50.2 ; D: 3.3 / 18.8
AVERAGE = C: 3.5 / 47.3 ; D: 3.4 / 73.5

Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m -Xincgc"
- C: 4.1 / 83.3 ; D: 3.5 / 35.4
- C: 3.3 / 48.5 ; D: 3.4 / 38.1
- C: 3.6 / 78.7 ; D: 3.2 / 20.2
AVERAGE = C: 3.7 / 70.2 ; D: 3.4 / 31.2

Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m
-XX:+UseParallelGC"
- C: 3.6 / 64.8 ; D: 3.4 /
24.4
- C: 3.0 / 68.1 ; D: 3.0 /
18.1
- C: 3.3 / 21.7 ; D: 3.2 /
16.4
AVERAGE = C: 3.3 / 51.5 ; D: 3.2 /
19.6

Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m
-XX:+UseConcMarkSweepGC"
- C: 3.7 / 98.8 ; D: 3.2 /
20.8
- C: 3.3 / 48.7 ; D: 3.4 /
56.5
- C: 3.6 / 25.3 ; D: 3.3 /
25.5
AVERAGE = C: 3.5 / 57.6 ; D: 3.3 /
34.3

Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
- C: 3.6 / 43.9 ; D: 3.5 /
24.2

- C: 3.5 / 26.9 ; D: 3.2 /
19.3

- C: 3.6 / 41.7 ; D: 3.4 /
18.6

AVERAGE = C: 3.6 / 37.5 ; D: 3.4 /
20.7


Java 1.7.0_60-ea 32-bit with JVMFLAGS="-server -Xms256m -Xmx256m
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
- C: 3.8 / 44.3 ; D: 3.3 /
16.2

- C: 3.6 / 30.0 ; D: 3.2 /
22.7

- C: 3.4 / 40.1 ; D: 2.7 /
22.3

AVERAGE = C: 3.6 / 38.1 ; D: 3.1 /
20.4




*===============================================================Area 2: Raw
App Data + JAVA GC
Logs===============================================================*

Below is raw data from my tool (showing create latency) combined with JAVA
GC logs.  Only a subset of the worst samples are shown since the GC
algorithm ran continuously during this test run:

// Latency suddenly increases from 3-4ms to 26ms after GC executed (see the
last column in each row):
Item 7949 TodCreateAck 14:42:31 ms= 1393360951713
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391776 3
Item 7950 TodCreateAck 14:42:31 ms= 1393360951713
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391776 3
Item 7951 TodCreateAck 14:42:31 ms= 1393360951714
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391777 4

2014-02-25T14:42:31.713-0600: [GC [PSYoungGen: 33120K->3680K(44544K)]
69580K->40140K(141056K), 0.0220970 secs] [Times: user=0.02 sys=0.00,
real=0.02 secs]

Item 7952 TodCreateAck 14:42:31 ms= 1393360951736
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391799 26
Item 7953 TodCreateAck 14:42:31 ms= 1393360951736
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391799 26
Item 7954 TodCreateAck 14:42:31 ms= 1393360951736
MonoCreateAck-StartMs-EndMs-DiffMs 81391773 81391799 26
.. etc ..

// Latency increases from 2ms to 29ms after GC is executed:
Item 50995 TodCreateAck 14:43:00 ms= 1393360980516
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420579 2
Item 50996 TodCreateAck 14:43:00 ms= 1393360980516
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420579 2
Item 50997 TodCreateAck 14:43:00 ms= 1393360980516
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420579 2

2014-02-25T14:43:00.517-0600: [GC [PSYoungGen: 46832K->20416K(56320K)]
83292K->57364K(152832K), 0.0254560 secs] [Times: user=0.06 sys=0.00,
real=0.02 secs]

Item 50998 TodCreateAck 14:43:00 ms= 1393360980543
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420606 29
Item 50999 TodCreateAck 14:43:00 ms= 1393360980543
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420606 29
Item 51000 TodCreateAck 14:43:00 ms= 1393360980543
MonoCreateAck-StartMs-EndMs-DiffMs 81420577 81420606 29
.. etc ..

// Latency increases from 6-7ms to 32ms after GC is executed:
Item 57078 TodCreateAck 14:43:04 ms= 1393360984605
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424668 6
Item 57079 TodCreateAck 14:43:04 ms= 1393360984605
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424669 7
Item 57080 TodCreateAck 14:43:04 ms= 1393360984605
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424669 7

2014-02-25T14:43:04.606-0600: [GC [PSYoungGen: 47104K->17408K(57344K)]
86780K->59844K(153856K), 0.0242830 secs] [Times: user=0.08 sys=0.00,
real=0.02 secs]

Item 57081 TodCreateAck 14:43:04 ms= 1393360984631
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424694 32
Item 57082 TodCreateAck 14:43:04 ms= 1393360984631
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424694 32
Item 57083 TodCreateAck 14:43:04 ms= 1393360984631
MonoCreateAck-StartMs-EndMs-DiffMs 81424662 81424694 32
.. etc ..


On Tue, Feb 25, 2014 at 9:56 AM, kishore g <g.kishore@gmail.com> wrote:

> My bad, i dint really see the GC log entry. I agree with Camille and Phunt
> that the GC is unlikely cause here. Another thing to consider is; do the
> latency spikes correlate to snapshot file generation. Are you writing the
> snapshots to RAMDisk or spinning disk.
>
> Without additional info/code, we can only guess.
>
> thanks,
> Kishore G
>
>
> On Mon, Feb 24, 2014 at 3:18 PM, Patrick Hunt <phunt@apache.org> wrote:
>
> > fwiw I've used strace in the past for things like this (if you were to
> > rule out GC, I believe it could easily be GC in this case but that's
> > easy to identify). Had a nasty ssd write latency issue that we just
> > couldn't figure out. Using strace we were able to see that fsyncs were
> > taking a really long time in some cases. (bad disk firmware most
> > likely). YMMV but strace with grep/wc/awk/gnuplot is great for this
> > stuff.
> >
> > Patrick
> >
> > On Mon, Feb 24, 2014 at 2:13 PM, Camille Fournier <camille@apache.org>
> > wrote:
> > > You can try CMS. I don't think gc should be causing you pauses unless
> > it's
> > > actually cleaning old gen, eden GC should be pauseless. You can tune
> the
> > > pool sizes to have enough space in old gen so you won't need pauses.
> The
> > > log you have printed above seems to be indicating the pauseless GC so
> I'm
> > > surprised it is causing noticeable performance degredation. But GC
> > > ergonomics aren't that hard to manage, especially in an application
> that
> > > should be used within existing process memory. Have you tried running
> > this
> > > on an oracle JDK instead of OpenJDK?
> > >
> > > C
> > >
> > >
> > > On Mon, Feb 24, 2014 at 3:34 PM, kishore g <g.kishore@gmail.com>
> wrote:
> > >
> > >> try CMS garbage collector and see if it improves. I think you are
> great
> > at
> > >> debugging, being new to JAVA and ZK, you were able to correlate GC
> > activity
> > >> with latency spikes. Kudos for that.
> > >>
> > >> Try the following JVM Flags.
> > >>
> > >> -server -Xms<> -Xmx<> -XX:NewSize=<> -XX:MaxNewSize=<>
> > >> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
> > >>
> > >> If you use disk as backing store, i dont think you can get a
> consistent
> > >> read/write of 5ms. There are lot of limitations in the design (most of
> > them
> > >> are there to ensure consistency, for example every writes ensure that
> > >> transaction log is fsynced before acknowledging to the client).
> > >>
> > >> RAM disk might give your performance but you need to be prepared for
> the
> > >> catastrophic scenario where all zookeepers go down.
> > >>
> > >> thanks,
> > >> Kishore G
> > >>
> > >>
> > >>
> > >> On Mon, Feb 24, 2014 at 9:36 AM, jmmec <jmmec2009@gmail.com> wrote:
> > >>
> > >> > Hey everyone,
> > >> >
> > >> > Did I mention that I'm a newbie to ZooKeeper and also to JAVA?   :)
> > >> >
> > >> > I enabled some JAVA GC logs via the "java.env" file:
> > >> >
> > >> > export JVMFLAGS="-Xms1024m -Xmx1024m -XX:+PrintGCDetails
> > >> > -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime"
> > >> >
> > >> > and confirmed that the periodic latency is due to JAVA GC
> operations.
> > >> >
> > >> > For example, below is a 26ms delay which corresponds to a 26ms delay
> > that
> > >> > my test app also saw (it uses the C API and connects to ZK remotely)
> > and
> > >> as
> > >> > also reported by ZK which is the only JAVA app running in the ZK
> > cluster:
> > >> >
> > >> > 2014-02-24T10:29:51.905-0600: [GC [PSYoungGen:
> > 275424K->12128K(305152K)]
> > >> > 325542K->73974K(1004544K), 0.0255720 secs] [Times: user=0.09
> sys=0.00,
> > >> > real=0.03 secs]
> > >> > 2014-02-24T10:29:51.931-0600: Total time for which application
> threads
> > >> were
> > >> > stopped: 0.0261350 seconds
> > >> >
> > >> > JAVA JVM tuning seems to be more of a black art than a science with
> > >> respect
> > >> > to GC and other settings.  I was wondering if anyone has any
> practical
> > >> > advice for JVM settings for the following configuration:
> > >> >
> > >> > a) ZK 3-node cluster running OpenJDK 1.7; ZK is the only app running
> > >> JAVA.
> > >> > b) Application znode data and watches will fit into < 100MB of
RAM
> > (say
> > >> > 250k znodes with ~150 bytes per znode with 2 watchers per znode)
> > >> >
> > >> > Consistent and fast read / write latency - say 5ms or less - is
> > critical
> > >> > for the small dataset above.  I'm trying to understand if this is
> > >> > obtainable with ZK & JAVA.  I realize that other factors come
into
> > play
> > >> as
> > >> > well (hardware / network).
> > >> >
> > >> > Thanks in advance for any advice.
> > >> >
> > >> >
> > >> > On Fri, Feb 21, 2014 at 7:51 AM, jmmec <jmmec2009@gmail.com>
wrote:
> > >> >
> > >> > > Thanks Camille, I definitely understand!  :)
> > >> > >
> > >> > > The two questions at the top of mind regarding ZooKeeper are:
> > >> > > 1. How does it calculate latencies?  I can dig into its code
to
> see.
> > >> > > 2. Is there anything in particular that might cause it to have
the
> > >> spiky
> > >> > > latency I've experienced?  I think I ruled out the snapshot
> > behavior by
> > >> > > having a high snapCount.
> > >> > >
> > >> > > Some other things I am planning to explore:
> > >> > > 1. My test software is rightfully suspect, so I'll review it
> > carefully
> > >> > > again and will simplify it further so that it is doing the
> absolute
> > >> bare
> > >> > > minimum.
> > >> > > 2. I'm running OpenJDK 1.7.0_60-ea so might swap to an earlier
> > and/or
> > >> > > different distribution.
> > >> > > 3. I'm running ZooKeeper 3.4.5 and might fall back to the 3.3.6
> > >> release.
> > >> > >
> > >> > > Hopefully one of the items above will reveal the root cause.
 Any
> > other
> > >> > > suggestions are welcome.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier <
> > camille@apache.org
> > >> > >wrote:
> > >> > >
> > >> > >> I might suggest that you create a personal github and mock
up a
> > >> > >> replication
> > >> > >> there :) I understand employers that own your code but unless
> > someone
> > >> > >> knows
> > >> > >> the answer off the top of their head, odds of finding the
cause
> are
> > >> low
> > >> > >> without something that replicates it, and knowing how busy
most
> of
> > us
> > >> > are
> > >> > >> here I don't know that we'll have time to do that for you.
> > >> > >>
> > >> > >> C
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Feb 20, 2014 at 9:41 PM, jmmec <jmmec2009@gmail.com>
> > wrote:
> > >> > >>
> > >> > >> > Thanks again,
> > >> > >> >
> > >> > >> > Unfortunately I can't share the test code since it is
> technically
> > >> the
> > >> > >> > property of my employer.
> > >> > >> >
> > >> > >> > It's very strange behavior.  I think I've said that
several
> times
> > >> now.
> > >> > >> > ha...
> > >> > >> >
> > >> > >> > Appreciate any additional help or advice or suggestions
from
> > >> everyone
> > >> > >> and
> > >> > >> > anyone and their brother or sister.
> > >> > >> >
> > >> > >> >
> > >> > >> >
> > >> > >> > On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier <
> > >> camille@apache.org
> > >> > >> > >wrote:
> > >> > >> >
> > >> > >> > > Can you share the test code somewhere (github maybe?)?
> > >> > >> > >
> > >> > >> > > Thanks,
> > >> > >> > > C
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > On Thu, Feb 20, 2014 at 9:08 PM, jmmec <jmmec2009@gmail.com>
> > >> wrote:
> > >> > >> > >
> > >> > >> > > > Thanks for the quick reply.
> > >> > >> > > >
> > >> > >> > > > I did not try the "slow" test using a normal
disk drive,
> > >> however I
> > >> > >> > first
> > >> > >> > > > discovered this problem when writing to a
7200RPM disk
> drive
> > at
> > >> a
> > >> > >> much
> > >> > >> > > > higher messaging rate (e.g. 1500 to 3000 creates/sec
rather
> > than
> > >> > 84
> > >> > >> > > > creates/sec).  This is what caused me to start
simplifying
> > the
> > >> > >> > > > configuration trying to find the root cause.
 As part of
> that
> > >> > >> > > > investigation, I created a RAM disk to avoid
the hard
> drive,
> > but
> > >> > the
> > >> > >> > hard
> > >> > >> > > > drive wasn't the problem.  I just haven't
switched back to
> > the
> > >> > hard
> > >> > >> > > drive.
> > >> > >> > > >
> > >> > >> > > > I don't know what ZooKeeper is doing internally,
or how &
> > why it
> > >> > is
> > >> > >> > > > deriving 76ms MAX latency.  The very regular
periodic
> pattern
> > >> > >> suggests
> > >> > >> > > > something odd.
> > >> > >> > > >
> > >> > >> > > > Hmmmm.....
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message