zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Song <tru64...@me.com>
Subject Re: Serious problem processing hearbeat on login stampede
Date Sat, 16 Apr 2011 06:25:17 GMT

2011. 4. 16., 오후 2:21, Ted Dunning 작성:

> You know, I think it would help if you would answer some of the questions that people
have posed.
> You say that it takes 1000 clients over 8 seconds to register.  That is about 100 transactions
per second.

Real reproducing scenario isn't what I mentioned initially.
It is not login, it is session expiring and closing process.

I know, we had test ZK many times well above this in our environment, and saw no problem.
So sorry about confusion.

> That is two orders of magnitude slower than others have observed ZK to be.  This is a
really big difference.
> So there is a big discrepancy here.  I am not saying you didn't observe what you say,
but I do think that there is something that you haven't mentioned because you haven't noticed
it yet.  If you go through the questions people have asked and answer them, there is a good
chance you will notice something that is causing your problems.  There is likely to be a problem
in the way that you have set up your machines.
> One pending question is whether you have separate log and snapshot disks.  Do you?

I have already answered this. I have no separate disk.
We have one filesystem mount point with RAID1 disks.

> Another is whether you have other processes running on the disk.  Are there?

Our ZK ensemble server are dedicated to ZK ensemble only

> Another is a request that you post some of the output of iostat with 5 second sampling
rate.  Can you post that output?

I will. It will be on Monday though.
But please note that I used to be a kernel filesystem engineer, and I know how to read iostat

> There are others questions that you will find in the email history.
> Remember, people answering your questions here are doing so because they are nice and
because they like to build a sense of community.  But to get a lot from them, you need to
work with them.

Please let me know if there are questions to be answered
I will try to update JIRA with answers in these emails.

Thank you.

> 2011/4/15 Chang Song <tru64ufs@me.com>
> I have file a JIRA bug
> https://issues.apache.org/jira/browse/ZOOKEEPER-1049
> We have measured I/O wait again, but found no IO activity due to ZK.
> Just regular page cache sync daemon in the work: 0-3%.
> I will have my team to attach ZK stat result.
> Thanks a lot.
> Let's move this discussion to JIRA
> 2011. 4. 15., 오전 7:34, Ted Dunning 작성:
> > You said that, but there was some skepticism from others about this.
> >
> > You need to try the monitoring that was suggested.  5 minute averages are
> > not useful.
> >
> > What does the stat four letter command return?  (
> > http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands )
> >
> > 2011/4/14 Chang Song <tru64ufs@me.com>
> >
> >> 2. We have a boot disk and usr disk.
> >>   But as I said, disk I/O is not an issue that's causing 8 second delay.
> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message