zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: question regarding connectionloss
Date Tue, 02 Feb 2010 19:15:10 GMT
You should never see connection loss except in the case where you have 
some network partition or some other issue that causes communication 
issues btw the client and server. (client swapping? server swapping or 
either having GC pause issues? etc...) Are you monitoring your 
hosts/network/jvms, etc..? "over virtualization" of the cluster hosts?

Take a look at your client/server logs and see if you can determine what 
the issue is. You might also try using some network level tools like 
ping/ssh to verify connectivity btw server/client. See this page for 
issues ppl have had in the past:
For example "Hardware misconfiguration - NIC" caused one system to 
basically work, but with huge numbers of connection loss, esp whenever 
there was load (and I've seen this particular issue twice now).



Michael Bauland wrote:
> Hi Ted,
> thanks for your reply.
>> This page: about Zookeeper error
>> handling<http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling>may
>> help.
> I actually read this page before. You may have misunderstood my
> question. I know how to recover from the connectionloss exception. I was
> just curious why it occurred so often in my described scenario. I would
> have assumed that in that scenario it shouldn't occur at all, but it was
> almost half of the requests that returned with a connectionloss.
> Cheers,
> Michael
>> On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <Michael.Bauland@knipp.de>wrote:
>>> Hello,
>>> I've got a question regarding the connectionloss exception thrown by Java.
>>> I've got an ensemble running with three zk servers. If one of the three
>>> servers is not running, the whole ensemble should still work (and it
>>> does, so that's fine). But in this situation I experience quite often a
>>> connectionloss exception and I'm wondering if I'm doing something wrong
>>> or if that's to be expected.
>>> My Code is rather simple:
>>> I create a new connection to my ensemble using
>>> ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());
>>> where connectString contains all three servers. Then I use the ZooKeeper
>>> to read data from a certain path:
>>> zk.getData (path, false, null);
>>> This call quite often returns an exception like
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /125/170/test
>>> But according to your documentation, the connectionloss exception should
>>> only occur in the following two cases:
>>>>    1. The application calls an operation on a session that is no longer
>>> alive/valid
>>> This should not be the case, since I only just created the session.
>>>>    2. The ZooKeeper client disconnects from a server when there are
>>> pending operations to that server, i.e., there is a pending asynchronous
>>> call.
>>> The should also not be the case. I was just doing a read request and no
>>> other client was accessing the ensemble.
>>> My only idea is that maybe the connection call first tried to connect to
>>>  the zookeeper server that was not running (remember only two of the
>>> three servers are running) and before it had a chance to try to connect
>>> to one of the other servers, my getData call was made and failed with
>>> connectionloss. Could that be the reason?
>>> But I thought the connection handling was automatic and if a connection
>>> failed the client would automatically try any of the other listed
>>> servers without the user noticing!?
>>> Thanks for any help.
>>> Cheers,
>>> Michael
>>> --
>>> Michael Bauland
>>> michael.bauland@knipp.de
>>> bauland.tel
> --
> Michael Bauland
> michael.bauland@knipp.de
> bauland.tel

View raw message