zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: sync vs. async vs. multi performances
Date Fri, 17 Feb 2012 08:41:57 GMT
Hi Ariel, That wiki is stale. Check it here:

	https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperPresentations

In particular check the HIC talk, slide 57. We were using 1k byte writes for those tests.

-Flavio

On Feb 15, 2012, at 12:18 AM, Ariel Weisberg wrote:

> Hi,
> 
> I tried to look at the presentations on the wiki, but the links aren't
> working? I was using
> http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
> error at the top of the page is "You are not allowed to do AttachFile on
> this page. Login and try again."
> 
> I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
> http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
> 5. Is it possible to beat the rotation speed?
> 
> You can increase the write size quite a bit to 240k and it only goes up to
> 10 milliseconds. http://pastebin.com/MSTwaHYN
> 
> My recollection was being in the 12-14 range, but I may be thinking of when
> I was pushing throughput.
> 
> Ariel
> 
> On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:
> 
>> Some of our previous measurements gave us around 5ms, check some of our
>> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
>> only volatile storage or battery backed cache. We do have the write cache
>> on for the numbers I'm referring to. There are also numbers there when the
>> write cache is off.
>> 
>> -Flavio
>> 
>> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>> 
>>> Hi,
>>> 
>>> It's only a minute of you process each region serially. Process 100 or
>> 1000
>>> in parallel and it will go a lot faster.
>>> 
>>> 20 milliseconds to synchronously commit to a 5.4k disk is about right.
>> This
>>> is assuming the configuration for this is correct. On ext3 you need to
>>> mount with barrier=1 (ext4, xfs enable write barriers by default). If
>>> someone is getting significantly faster numbers they are probably writing
>>> to a volatile or battery backed cache.
>>> 
>>> Performance is relative. The number of operations the DB can do is
>> roughly
>>> constant although multi may be able to more efficiently batch operations
>> by
>>> amortizing all the coordination overhead.
>>> 
>>> In the synchronous case the DB is starved for work %99 of the time so it
>> is
>>> not surprising that it is slow. You are benchmarking round trip time in
>>> that case, and that is dominated by the time it takes to synchronously
>>> commmit something to disk.
>>> 
>>> In the asynchronous case there is plenty of work and you can fully
>> utilize
>>> all the throughput available to get it done because each fsync makes
>>> multiple operations durable. However the work is still presented
>> piecemeal
>>> so there is per-operation overhead.
>>> 
>>> Caveat, I am on 3.3.3 so I haven't read how multi operations are
>>> implemented, but the numbers you are getting bear this out. In the
>>> multi-case you are getting the benefit of keeping the DB fully utilized
>>> plus amortizing the coordination overhead across multiple operations so
>> you
>>> get a boost in throughput beyond just async.
>>> 
>>> Ariel
>>> 
>>> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nkeywal@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Thanks for the replies.
>>>> 
>>>> It's used when assigning the regions (kind of dataset) to the
>> regionserver
>>>> (jvm process in a physical server). There is one zookeeper node per
>> region.
>>>> On a server failure, there is typically a few hundreds regions to
>> reassign,
>>>> with multiple status written in . On paper, if we need 0,02s per node,
>> that
>>>> makes it to the minute to recover, just for zookeeper.
>>>> 
>>>> That's theory. I haven't done a precise measurement yet.
>>>> 
>>>> 
>>>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> N.
>>>> 
>>>> 
>>>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <ted.dunning@gmail.com>
>>>> wrote:
>>>> 
>>>>> These results are about what is expected although the might be a little
>>>>> more extreme.
>>>>> 
>>>>> I doubt very much that hbase is mutating zk nodes fast enough for this
>> to
>>>>> matter much.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 14, 2012, at 8:00, N Keywal <nkeywal@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I've done a test with Zookeeper 3.4.2 to compare the performances
of
>>>>>> synchronous vs. asynchronous vs. multi when creating znode (variations
>>>>>> around:
>>>>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is
at
>>>> the
>>>>>> end of the mail.
>>>>>> 
>>>>>> I've tested different environments:
>>>>>> - 1 linux server with the client and 1 zookeeper node on the same
>>>> machine
>>>>>> - 1 linux server for the client, 1 for 1 zookeeper node.
>>>>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>>>>>> 
>>>>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
>> HD.
>>>>>> 
>>>>>> But the results are comparable:
>>>>>> 
>>>>>> Using the sync API, it takes 200 seconds for 10K creations, so around
>>>>> 0.02
>>>>>> second per call.
>>>>>> Using the async API, it takes 2 seconds for 10K (including waiting
for
>>>>> the
>>>>>> last callback message)
>>>>>> Using the "multi" available since 3.4, it takes less than 1 second,
>>>> again
>>>>>> for 10K.
>>>>>> 
>>>>>> I'm surprised by the time taken by the sync operation, I was not
>>>>> expecting
>>>>>> it to be that slow. The gap between async & sync is quite huge.
>>>>>> 
>>>>>> Is this something expected? Zookeeper is used in critical functions
in
>>>>>> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
>>>>> but
>>>>>> it seems low compared to async (well ~3 times faster :-). There are
>>>> many
>>>>>> small data creations/deletions with the sync API in the existing
hbase
>>>>>> algorithms, it would not be simple to replace them all by asynchronous
>>>>>> calls...
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> N.
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> public class ZookeeperTest {
>>>>>> static ZooKeeper zk;
>>>>>> static int nbTests = 10000;
>>>>>> 
>>>>>> private ZookeeperTest() {
>>>>>> }
>>>>>> 
>>>>>> public static void test11() throws Exception {
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> public static void test51() throws Exception {
>>>>>>  final AtomicInteger counter = new AtomicInteger(0);
>>>>>> 
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>>>>>>      new AsyncCallback.StringCallback() {
>>>>>>        public void processResult(int i, String s, Object o, String
>>>> s1)
>>>>> {
>>>>>>          counter.incrementAndGet();
>>>>>>        }
>>>>>>      }
>>>>>>      , null);
>>>>>>  }
>>>>>> 
>>>>>>  while (counter.get() != nbTests) {
>>>>>>    Thread.sleep(1);
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> public static void test41() throws Exception {
>>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    ops.add(
>>>>>>      Op.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>>>>>>    );
>>>>>>  }
>>>>>> 
>>>>>>  zk.multi(ops);
>>>>>> }
>>>>>> 
>>>>>> public static void delete() throws Exception {
>>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>> 
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    ops.add(
>>>>>>      Op.delete("/dummyTest_" + i,-1)
>>>>>>    );
>>>>>>  }
>>>>>> 
>>>>>>  zk.multi(ops);
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> public static void test(String connection, String testName) throws
>>>>>> Throwable{
>>>>>>  Method m = ZookeeperTest.class.getMethod(testName);
>>>>>> 
>>>>>>  zk = new ZooKeeper(connection, 20000, new Watcher() {
>>>>>>    public void process(WatchedEvent watchedEvent) {
>>>>>>    }
>>>>>>  });
>>>>>> 
>>>>>>  final long start = System.currentTimeMillis();
>>>>>> 
>>>>>>  try {
>>>>>>    m.invoke(null);
>>>>>>  } catch (IllegalAccessException e) {
>>>>>>    throw e;
>>>>>>  } catch (InvocationTargetException e) {
>>>>>>    throw e.getTargetException();
>>>>>>  }
>>>>>> 
>>>>>>  final long end = System.currentTimeMillis();
>>>>>> 
>>>>>>  zk.close();
>>>>>> 
>>>>>>  final long endClose = System.currentTimeMillis();
>>>>>> 
>>>>>>  System.out.println(testName+":  ExeTime= " + (end - start) );
>>>>>> }
>>>>>> 
>>>>>> public static void main(String... args) throws Throwable {
>>>>>>    test(args[0], args[1]);
>>>>>> }
>>>>>> }
>>>>> 
>>>> 
>> 
>> flavio
>> junqueira
>> 
>> research scientist
>> 
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>> 
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>> 
>> 

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message