zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Weisberg <aweisb...@voltdb.com>
Subject Re: sync vs. async vs. multi performances
Date Tue, 14 Feb 2012 23:18:30 GMT
Hi,

I tried to look at the presentations on the wiki, but the links aren't
working? I was using
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
error at the top of the page is "You are not allowed to do AttachFile on
this page. Login and try again."

I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
5. Is it possible to beat the rotation speed?

You can increase the write size quite a bit to 240k and it only goes up to
10 milliseconds. http://pastebin.com/MSTwaHYN

My recollection was being in the 12-14 range, but I may be thinking of when
I was pushing throughput.

Ariel

On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:

> Some of our previous measurements gave us around 5ms, check some of our
> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
> only volatile storage or battery backed cache. We do have the write cache
> on for the numbers I'm referring to. There are also numbers there when the
> write cache is off.
>
> -Flavio
>
> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>
> > Hi,
> >
> > It's only a minute of you process each region serially. Process 100 or
> 1000
> > in parallel and it will go a lot faster.
> >
> > 20 milliseconds to synchronously commit to a 5.4k disk is about right.
> This
> > is assuming the configuration for this is correct. On ext3 you need to
> > mount with barrier=1 (ext4, xfs enable write barriers by default). If
> > someone is getting significantly faster numbers they are probably writing
> > to a volatile or battery backed cache.
> >
> > Performance is relative. The number of operations the DB can do is
> roughly
> > constant although multi may be able to more efficiently batch operations
> by
> > amortizing all the coordination overhead.
> >
> > In the synchronous case the DB is starved for work %99 of the time so it
> is
> > not surprising that it is slow. You are benchmarking round trip time in
> > that case, and that is dominated by the time it takes to synchronously
> > commmit something to disk.
> >
> > In the asynchronous case there is plenty of work and you can fully
> utilize
> > all the throughput available to get it done because each fsync makes
> > multiple operations durable. However the work is still presented
> piecemeal
> > so there is per-operation overhead.
> >
> > Caveat, I am on 3.3.3 so I haven't read how multi operations are
> > implemented, but the numbers you are getting bear this out. In the
> > multi-case you are getting the benefit of keeping the DB fully utilized
> > plus amortizing the coordination overhead across multiple operations so
> you
> > get a boost in throughput beyond just async.
> >
> > Ariel
> >
> > On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nkeywal@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Thanks for the replies.
> >>
> >> It's used when assigning the regions (kind of dataset) to the
> regionserver
> >> (jvm process in a physical server). There is one zookeeper node per
> region.
> >> On a server failure, there is typically a few hundreds regions to
> reassign,
> >> with multiple status written in . On paper, if we need 0,02s per node,
> that
> >> makes it to the minute to recover, just for zookeeper.
> >>
> >> That's theory. I haven't done a precise measurement yet.
> >>
> >>
> >> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
> >>
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >>
> >>> These results are about what is expected although the might be a little
> >>> more extreme.
> >>>
> >>> I doubt very much that hbase is mutating zk nodes fast enough for this
> to
> >>> matter much.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2012, at 8:00, N Keywal <nkeywal@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
> >>>> synchronous vs. asynchronous vs. multi when creating znode (variations
> >>>> around:
> >>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
> >> the
> >>>> end of the mail.
> >>>>
> >>>> I've tested different environments:
> >>>> - 1 linux server with the client and 1 zookeeper node on the same
> >> machine
> >>>> - 1 linux server for the client, 1 for 1 zookeeper node.
> >>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >>>>
> >>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
> HD.
> >>>>
> >>>> But the results are comparable:
> >>>>
> >>>> Using the sync API, it takes 200 seconds for 10K creations, so around
> >>> 0.02
> >>>> second per call.
> >>>> Using the async API, it takes 2 seconds for 10K (including waiting for
> >>> the
> >>>> last callback message)
> >>>> Using the "multi" available since 3.4, it takes less than 1 second,
> >> again
> >>>> for 10K.
> >>>>
> >>>> I'm surprised by the time taken by the sync operation, I was not
> >>> expecting
> >>>> it to be that slow. The gap between async & sync is quite huge.
> >>>>
> >>>> Is this something expected? Zookeeper is used in critical functions
in
> >>>> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> >>> but
> >>>> it seems low compared to async (well ~3 times faster :-). There are
> >> many
> >>>> small data creations/deletions with the sync API in the existing hbase
> >>>> algorithms, it would not be simple to replace them all by asynchronous
> >>>> calls...
> >>>>
> >>>> Cheers,
> >>>>
> >>>> N.
> >>>>
> >>>> --
> >>>>
> >>>> public class ZookeeperTest {
> >>>> static ZooKeeper zk;
> >>>> static int nbTests = 10000;
> >>>>
> >>>> private ZookeeperTest() {
> >>>> }
> >>>>
> >>>> public static void test11() throws Exception {
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >>>>   }
> >>>> }
> >>>>
> >>>>
> >>>> public static void test51() throws Exception {
> >>>>   final AtomicInteger counter = new AtomicInteger(0);
> >>>>
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >>>>       new AsyncCallback.StringCallback() {
> >>>>         public void processResult(int i, String s, Object o, String
> >> s1)
> >>> {
> >>>>           counter.incrementAndGet();
> >>>>         }
> >>>>       }
> >>>>       , null);
> >>>>   }
> >>>>
> >>>>   while (counter.get() != nbTests) {
> >>>>     Thread.sleep(1);
> >>>>   }
> >>>> }
> >>>>
> >>>> public static void test41() throws Exception {
> >>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     ops.add(
> >>>>       Op.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >>>>     );
> >>>>   }
> >>>>
> >>>>   zk.multi(ops);
> >>>> }
> >>>>
> >>>> public static void delete() throws Exception {
> >>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     ops.add(
> >>>>       Op.delete("/dummyTest_" + i,-1)
> >>>>     );
> >>>>   }
> >>>>
> >>>>   zk.multi(ops);
> >>>> }
> >>>>
> >>>>
> >>>> public static void test(String connection, String testName) throws
> >>>> Throwable{
> >>>>   Method m = ZookeeperTest.class.getMethod(testName);
> >>>>
> >>>>   zk = new ZooKeeper(connection, 20000, new Watcher() {
> >>>>     public void process(WatchedEvent watchedEvent) {
> >>>>     }
> >>>>   });
> >>>>
> >>>>   final long start = System.currentTimeMillis();
> >>>>
> >>>>   try {
> >>>>     m.invoke(null);
> >>>>   } catch (IllegalAccessException e) {
> >>>>     throw e;
> >>>>   } catch (InvocationTargetException e) {
> >>>>     throw e.getTargetException();
> >>>>   }
> >>>>
> >>>>   final long end = System.currentTimeMillis();
> >>>>
> >>>>   zk.close();
> >>>>
> >>>>   final long endClose = System.currentTimeMillis();
> >>>>
> >>>>   System.out.println(testName+":  ExeTime= " + (end - start) );
> >>>> }
> >>>>
> >>>> public static void main(String... args) throws Throwable {
> >>>>     test(args[0], args[1]);
> >>>> }
> >>>> }
> >>>
> >>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message