"100mb partition"? sounds like virtualization. resource starvation
(worse in virtualized env) is a common cause of this. Are your clients
gcing/swapping at all? If a client gc's for long periods of time the
heartbeat thread won't be able to run and the server will expire the
session. There is a min/max cap that the server places on the client
timeouts (it's negotiated), check the client log for detail on what
timeout it negotiated (logged in 3.3 releases)
take a look at this and see if you can make progress:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
My guess is that your client is gcing for long periods of time - you can
rule this in/out by turning on gc logging in your clients and then
viewing the results after another such incident happens (try gchisto for
graphical view)
Patrick
On 06/09/2010 11:36 AM, Jordan Zimmerman wrote:
> We have a test system using Zookeeper. There is a single Zookeeper
> server node and 4 clients. There is very little activity in this
> system. After a day's testing we start to see SessionExpiredException
> on the client. Things I've tried:
>
> * Increasing the session timeout to 1 minute * Making sure all JVMs
> are running in a 100MB partition
>
> Any help debugging this problem would be appreciated. What kind of
> diagnostics should can I add? Are there more config parameters that I
> should try?
>
> -JZ
|