zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Bowman <ebow...@boboco.ie>
Subject Re: Debugging help for SessionExpiredException
Date Wed, 16 Jun 2010 09:21:48 GMT
Setting up a little process to run overnight that appends a timestamp to
a file once per second or so can be a very effective tool for ruling
out, for example, "extra-dimensional" VM effects.

On 06/16/2010 12:15 AM, Patrick Hunt wrote:
> I'm not very experienced personally with running zk on ec2 smalls, Ted
> usually has the ec2 related insight. Given these boxes are not loaded
> or lightly loaded, and you've ruled out gc/swap, the only thing I can
> think of is that something is going on under the covers at the vm
> level that's causing the high latency you're seeing.
> You're seeing 15 _minutes_ max latency. I can't think of what would
> cause that inside zk. Any chance that the VM is shutting down or
> "freezing" during that period? I dont' know. Are you monitoring that
> system from a second system? Perhaps that might shed some light
> (monitor the cpu/disk activity using some monitoring tool like
> ganglia, nagios, etc... or even more primitive, perhaps doing a ping
> to that system and tracking the round trip time/packet loss, dump to a
> file and review the next day, etc...)
> Patrick
> On 06/15/2010 03:59 PM, Jordan Zimmerman wrote:
>> They're small instances. The thing is that these machines are doing
>> next to no work. We're just running simple little tests. The session
>> expiration has not happened while I've been watching. It tends to
>> happen over night.
>> -JZ
>> On Jun 15, 2010, at 1:50 PM, Ted Dunning wrote:
>>> As usual, the ZK team provides the best feedback.
>>> I would be bold enough to ask what kind of ec2 instances you are
>>> running on.  Small instances are small chunks of larger machines
>>> and are sometimes subject to competition for resources from the
>>> other tenants.
>>> On Tue, Jun 15, 2010 at 12:30 PM, Patrick Hunt<phunt@apache.org>
>>> wrote: 3) under-provisioned virtual machines (ie vmware)
>>> ...
>>> Given that you've ruled out the gc (most common), disk utilization
>>> would be the next thing to check.

Eric Bowman
Boboco Ltd

View raw message