airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <>
Subject Re: Intermittent connection loss error
Date Wed, 22 Apr 2015 17:55:09 GMT
You can lose connections to ZK for a variety of reasons:

* Network partition
* Restart of the leader node (causes loss of ZK quorum while a new leader is elected)
* Slow network (causing a client to miss a heartbeat to the server)
* Random server crashes, etc.

The standard ZooKeeper client can transparently handle most of these. When it loses connection
to a server, it will try to reconnect to another server. However, it could fail to connect
before the session times out. If this happens, you must re-create the ZooKeeper handle. Also,
while it’s trying to connect, all ZooKeeper API calls will fail until connection is re-established.
If your code isn’t prepared to handle these types of things, you’ll have problems.

Curator helps in several ways (note: Curator is a wrapper around the standard ZooKeeper client):

* Curator has a built in retry mechanism so that clients are immune to short-term connection
losses. i.e. normally the ZooKeeper client can reconnect to another server very quickly. So,
the first API call may fail but Curator will retry it a few times and it’s likely to succeed
on subsequent tries.

* Curator monitors the internal ZooKeeper connection. Any Curator API will wait until connection
is established, if the ZooKeeper session fails Curator will transparently recreate a new ZooKeeper
instance. Most of the drudgery of managing the ZooKeeper connection is done for you by Curator.

These are the main features that will help connection problems with ZK. Additionally, Curator…

* Has dozens of pre-built recipes that are production tested by thousands of sites: locks,
leaders, caches, etc.

* Has lots of nice utilities that make writing new recipes much easier.

* Has APIs that work around well known ZK edge cases. E.g. guaranteed deletes, protected sequential
node creation, automatic parent node creation, etc.

That said, even with Curator, writing correct ZooKeeper applications is not easy. I usually
tell people “Friends don’t let friends write ZooKeeper recipes”. If you want me to review
some of your usages, I can do that.

I hope this helps.


On April 22, 2015 at 9:44:30 AM, Suresh Marru ( wrote:

Hi Jordon,

Can you please advice us on this issue? Within Apache Airavata, we are using Zookeper for
co-ordiantion of services. Intermittently, we see ZK connection loss errors. 

Is this an issue curator will help mitigate it? 

Can you please also shed some light on how to decide on when we use ZK vs curator? 


On Apr 22, 2015, at 12:59 AM, Lahiru Ginnaliya Gamathige <> wrote:

---------- Forwarded message ----------
From: <>
Date: Wed, Apr 22, 2015 at 12:51 AM
Subject: Re: Intermittent connection loss error

Hi Lahiru Ginnaliya Gamathige,

Once in a while there may be a time change caused by NTP which causes all zookeeper client
sessions to close. This may be the reason. Pls refer
(But this has been fixed in 3.5.1). This may help you.

Indira Priyadharshini
From: Lahiru Ginnaliya Gamathige <>
Sent: Tuesday, April 21, 2015 8:13 PM
Subject: Intermittent connection loss error

Hi Devs,

We are using ZK in Apache Airavata and when we run it for sometime some
connections are get lost and never get reconnect. I get following error and
since I try to reconnect in my process method it keeps trying and exhaust
the log. Of course I can fix the log issue but I am not sure why this is
happening . I am using ZK in standalone mode just single instance and below
is the code I use to reconnect and the log.

2015-04-08 09:43:10,785 [main-SendThread(] WARN
org.apache.zookeeper.ClientCnxn - Session 0x0 for server, unexpected error, closing socket
connection and attempting reconnect Connection reset by peer
at Method)
at org.apache.zookeeper.ClientCnxn$

synchronized public void process(WatchedEvent watchedEvent) {;
    synchronized (mutex) {
        Event.KeeperState state = watchedEvent.getState();;
            case SyncConnected:
            case Expired:case Disconnected:
                try {
                    mutex = -1;
                    zk = new
AiravataZKUtils.getZKTimeout(), this);
                    synchronized (mutex) {
                        mutex.wait();  // waiting for the syncConnected event
                } catch (IOException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (ApplicationSettingsException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (InterruptedException e) {
                    logger.error("Error while synchronizing with zookeeper", e);
                } catch (AiravataSystemException e) {
                    logger.error("Error while synchronizing with zookeeper", e);


Research Assistant
Science Gateways Group
Indiana University
The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. WARNING: Computer viruses can be transmitted via email.
The recipient should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by this email.

Research Assistant 
Science Gateways Group
Indiana University

View raw message