hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Crawford <traviscrawf...@gmail.com>
Subject Re: Misbehaving zk servers
Date Thu, 29 Apr 2010 17:09:46 GMT
On Thu, Apr 29, 2010 at 9:49 AM, Patrick Hunt <phunt@apache.org> wrote:
> Is there any good (simple/fast/bulletproof) way to monitor the FD use inside
> the jvm? If so we could stop accepting new client connections once we get
> close to the os imposed limit... The test would have to be a bulletproof one
> though - we wouldn't want to end up in some worse situation (where we refuse
> connection because we mistakenly believe that the limit has been reached).
>
> Might be good to open a JIRA for this and add some tests. In particular we
> should verify the server handles this as gracefully as it can when the limit
> has been reached.

Poking around with jconsole I found two stats that already measure FDs:

- java.lang.OperatingSystem.MaxFileDescriptorCount
- java.lang.OperatingSystem.OpenFileDescriptorCount

They're described (rather tersely) at:

http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html

So it sounds like the feature request would be stop accepting new
client connections if OpenFileDescriptorCount > 95% of
MaxFileDescriptorCount? Only start accepting new requests when
OpenFileDescriptorCount < 90% of MaxFileDescriptorCount. Basically the
high/low watermark thing.

Thoughts?

--travis




>
> Patrick
>
> On 04/29/2010 09:34 AM, Mahadev Konar wrote:
>>
>> Hi Travis,
>>
>>  How many clients did you have connected to this server? Usually the
>> default
>> is 8K file descriptors. Did you have clients more than that?
>>
>> Also, if clients fail to attach to a server, they will run off to another
>> server. We do not do any blacklisting because we expect the server to heal
>> and if it does not, it mostly shuts itself down in most of the cases.
>>
>> Thanks
>> mahadev
>>
>>
>> On 4/29/10 12:08 AM, "Travis Crawford"<traviscrawford@gmail.com>  wrote:
>>
>>> Hey zookeeper gurus -
>>>
>>> We recently had a zookeeper outage when one ZK server was started with
>>> a low limit after upgrading to 3.3.0. Several days later the outage
>>> occurred when that node reached its file descriptor limit and clients
>>> started having major issues.
>>>
>>> Are there any circumstances when a ZK server will get blacklisted from
>>> the ensemble? Something similar to how tasktrackers are blacklisted
>>> when too many tasks fail.
>>>
>>> Thanks!
>>> Travis
>>
>

Mime
View raw message