zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ibrahim El-sanosi (PGR)" <i.s.el-san...@newcastle.ac.uk>
Subject RE: Latency in asynchronous mode
Date Fri, 24 Oct 2014 23:04:32 GMT
Kishore,

I mean by "my work  focus in latency, therefore it is better to use sync mode", is that I
want to measure the latency per request while I send back to back request (send a larger number
of request continuously). In this case, I said it is better to use a sync mode. Am I right
here?

 In my scenario (send a larger number of request continuously), If I use async mode, it will
be difficult to measure a latency per request as group commit prevent me to obtain the latency
per request. I tried to measure latency per request using async mode by using the following
sample code:

submitTimeWrite = (double)System.nanoTime();//catch start time of sending request
_client.create().inBackground(new Double(time)).forPath(_path + "/" + _count, data);//creade
a node in Zookeeper
endTimeWrite = (double)System.nanoTime();//catch end time of executed request
latencyInfos.add(""+((endTimeWrite - submitTimeWrite)/1000000)); // add latency per request
to array list.

But the result obtaining from using the above code is completely  different compared to stat
command result. The following sample result generates using the above code:
0.004395
0.004297
0.004256
0.004308
0.004353
0.004293
0.004309
0.004421
0.004325

Here I am really surprising getting this result and does not make sense for me as the time
is very low.

However, using four latter command (stat), I got the following result:
Latency min/avg/max: 252/368/505

I think that the latency result generates by  command stat is measured the latency per group
commit (one batch). But I cannot understand why my code it gives me very low latency compared
to stat command. Note that I use the same code to measure the latency per request in sync
mode (except the create operation, I use create type of sync), the result was really identical
compared to stat command.

Any thought?

Thank you

Note that I use 10 clients to send requests concurrently. 

-----Original Message-----
From: kishore g [mailto:g.kishore@gmail.com] 
Sent: Friday, October 24, 2014 03:44 م
To: user@zookeeper.apache.org
Subject: Re: Latency in asynchronous mode

You got the idea and concepts right but the conclusion ("use the Sync mode to test the performance
because my work  focus in latency.")  is not necessarily true. Even with async api, if the
client sends only one request at a time (i.e make an async call, wait for the ack from server,
then send another request), you should see the same result as you would see with sync api.
At the same time, using sync api does not necessarily guarantee low latency as it depends
on how many clients requests are concurrently handled on the server side.

The reason sync/async api options are provided to the client is because picking one v/s other
depends on the client usage pattern. This is how I would decide to use sync v/s async. Please
note this is based on my experience building Apache Helix, S4 etc. Others feel free to correct
if something is wrong. I am assuming your zk access if from a single thread on the client
side,

   - If most of my interaction with zookeeper consists of
   creating/updating/deleting/reading only one znode at a time from a single
   thread, I will use sync api for its simplicity.
   - If I have to write/read a large number of  records in one go from a
   single thread, use async api.

I have purposely restricted the options to single thread and modifying different records.
If you have multiple threads, there are some corner cases and they require some fancy optimizations
to get good latency. In general if you can absorb the complexity of async api (especially
dealing with error cases) go for async or stick with sync until performance becomes really
critical. As an example, we started with sync api while building Helix and when we realized
that fail over performance is critical we started using async and even added additional features
such a grouping of multiple requests on the client side, grouping of notifications etc.

Hope this helps.

thanks,
Kishore G



On Thu, Oct 23, 2014 at 3:31 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk>
wrote:

> WOW Kishore, You put the train on the tracks.
>
> Yes, you are absolutely right, your answer makes sense to me.
>
> First of all, I concentrate on the latency no throughput. Therefore, 
> it is better to use the sync mode rather than using the async mode.
> Second, I did measurement of latency and throughput in both Async and 
> Sync mode, the sample result is as following:
>
> 1- Sync mode with one client send CREATE requests last for 30 seconds:
> Latency min/avg/max: 7/25/55
> Throughput 224
>
> 1- Aync mode with one client send CREATE requests last for 30 seconds:
> Latency min/avg/max: : 224/344/507
> Throughput 3641
>
> The above result supports your thought,  using single fsync (Sync 
> mode) can low latency and decease the throughput, whereas, in batch 
> multiple requests (Async mode) the latency will be higher and the 
> throughput increase.
>
> To conclude, I should use the Sync mode to test the performance 
> because my work  focus in latency.  Am I right?
>
> Thank you
>
> Ibrahim
>
>
> -----Original Message-----
> From: kishore g [mailto:g.kishore@gmail.com]
> Sent: Thursday, October 23, 2014 08:14 م
> To: user@zookeeper.apache.org
> Subject: Re: Latency in asynchronous mode
>
> Async api is zookeeper is a way to achieve high through but by trading 
> off latency. As others have explained, before returning success to the 
> user, zookeeper always ensures that the entry is flushed into the 
> transaction log. This operation is expensive can take around 5-20 ms on spinning disk.
> So if zookeeper followed a naive way of invoking fsync for every 
> request it would be able to handle approx 200 transaction per second 
> at max( again depending on the fsync time). But the server tries to do 
> further optimization by trying to batch multiple requests in one fsync 
> also commonly known as group commit. This of course comes impacts the 
> latency because each request now has to incur some additional latency 
> because the amount of data written to disk is proportional to the 
> batch size where as in the sync request each request would write the 
> data proportional to that request.
>
> So coming back to your question, what you should be really measuring 
> is the amortized latency on a per write basis from the client. By 
> using sync api, its unlikely that the group flush is kicking in 
> because the client waits for the ack of the previous request before 
> sending a new request. So you are seeing low latency but the total 
> number of writes done during your test would be low compared to the 
> async api test. In case of async client is not really waiting for ack 
> for the write in the same thread. So all clients continue to send 
> requests which means there is a high chance of group flush to kick in 
> on the server side and because of this the perceived latency can be 
> higher. For example, if you send 1000 requests back to back in async 
> you might see the latency as X ms for each request but if you measure 
> the wall clock time from the start of first request to the last ack got from the server
it would be around X ms.
>
> A good way to understand this is to measure both latency and 
> throughput (total number of writes) from your client. Also its not 
> clear if the clients are trying to create new znodes or updating the 
> same one. If they are updating the same one there might be some 
> conflicts that might create additional latency. And are the clients 
> doing read and write?. There are other design choices in zookeeper 
> such a single queue for both reads and writes that might impact 
> latency as well. It might help if you share the client code.
>
> I think its important to understand your goal. In general there are 
> two things one would like to achieve low latency and high throughput. 
> Achieving both is hard especially when it involves disk io and fsync.
>
> thanks,
> Kishore G
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Oct 23, 2014 at 11:21 AM, Ibrahim El-sanosi (PGR) < 
> i.s.el-sanosi@newcastle.ac.uk> wrote:
>
> > Hi Rakesh,
> >
> > First of all, the zookeeper ensemble consists of five Zookeeper servers.
> > Also I have another 10 clients machines used to send write requests 
> > to Zookeeper. The benchmark code creates 5 threads (equal to number 
> > of Zookeeper server) , each thread associates with one Zookeeper server.
> > So, in this case, each zookeeper server will receive a set of write
> requests.
> > The benchmark code runs for 30 seconds.
> >
> >  Async tests:
> >
> > * Number of clients
> > In fact, I have different test, each test has different number of
> clients.
> > For example, the bellow shows the latency corresponds to different 
> > number of clients:
> > Five clients: Latency min/avg/max: 235/366/515 Ten clients:  Latency
> > min/avg/max: 252/368/505
> >
> > * Number of threads
> > As explained above, each client creates 5 threads and each thread 
> > connects to one Zookeeper server. For instance, test using 5 clients’
> > machines, each Zookeeper server receives five threads.
> >
> > * data size storing in each znode
> > The data size store in znode is 100 bytes
> >
> > Also, it would be good to monitor :
> >
> > 1) JVM stats(one way is through JMX) like heap, gc activities. This 
> > is to see if latency spike corresponds to gc activity or not.
> >
> > If you mean by JVM stats the four word stat command, then  the 
> > latency result showed above is generated using this command. If you 
> > mean something else then I have to read about and tell you late on.
> >
> > 2) Since you are doubting fsync, I think $ iostat would be helpful 
> > to see disk statistics. For example, $ iostat -d -x 2 10 and 
> > collects the disk latency.
> >
> > Yes, the batch size that I use in SyncrequestProcessor class is 1000 
> > requests. I think this is preferable size. Also, I will try to use
> iostat.
> >
> > 3) CPU usage through top or sar unix commands. I didn't use sar , 
> > but I could see it gives more details like percent of CPU running 
> > idle with a process waiting for block I/O etc.
> >
> > Yes, I will use the top command to gathering the resource utilization.
> > However, I don’t think top or sar will answer my question. Because I 
> > am thinking there is different between Asynchroned and Synchronized 
> > mode for measuring the latency.
> >
> > Thank you for your attention
> >
> > I look forward to hearing from you
> >
> >
> > Ibrahim
> >
> > -----Original Message-----
> > From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com]
> > Sent: Thursday, October 23, 2014 03:58 م
> > To: user@zookeeper.apache.org
> > Subject: Re: Latency in asynchronous mode
> >
> > Hi Ibrahim,
> >
> > In async tests, could you give the details like:
> >
> > * number of clients
> > * number of threads
> > * data size storing in each znode
> >
> > Also, it would be good to monitor :
> >
> > 1) JVM stats(one way is through JMX) like heap, gc activities. This 
> > is to see if latency spike corresponds to gc activity or not.
> >
> > 2) Since you are doubting fsync, I think $ iostat would be helpful 
> > to see disk statistics. For example, $ iostat -d -x 2 10 and 
> > collects the disk latency.
> >
> > 3) CPU usage through top or sar unix commands. I didn't use sar , 
> > but I could see it gives more details like percent of CPU running 
> > idle with a process waiting for block I/O etc.
> >
> >
> > -Rakesh
> >
> >
> > On Thu, Oct 23, 2014 at 6:44 PM, Alexander Shraer 
> > <shralex@gmail.com>
> > wrote:
> >
> > > Maybe due to queueing at the leader in asynchronous mode - if in 
> > > your experiment you have one client in sync mode the leader has 
> > > just one op in the queue at a time On Oct 23, 2014 1:57 PM, "Ibrahim"
> > > <i.s.el-sanosi@newcastle.ac.uk> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I am testing ZooKeeper latency in Asynchronous mode. I am 
> > > > sending update
> > > > (write) requests to Zookeeper cluster that consists of 5 
> > > > physical Zookeeper.
> > > >
> > > > So, when I run the stat command I get high latency like:
> > > > Latency min/avg/max: 7/339/392
> > > > Latency min/avg/max: 1/371/627
> > > > Latency min/avg/max: 1/371/627
> > > > Latency min/avg/max: 1/364/674
> > > > I guess such high latency correspond to fsync (batch requests).
> > > > But I
> > > wish
> > > > if someone could help me and explain this behaviour.
> > > >
> > > > However, testing Zookeeper using Synchronous mode, it gives me 
> > > > reasonable result like:
> > > > Latency min/avg/max: 6/24/55
> > > > Latency min/avg/max: 7/22/61
> > > > Latency min/avg/max: 7/30/65
> > > >
> > > > Note that the latency measures in milliseconds.
> > > >
> > > > I look forward to hearing from you.
> > > >
> > > > Ibrahim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > > http://zookeeper-user.578899.n2.nabble.com/Latency-in-asynchronous
> > > -m
> > > od
> > > e-tp7580446.html
> > > > Sent from the zookeeper-user mailing list archive at Nabble.com.
> > > >
> > >
> >
>
Mime
View raw message