zookeeper-bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G <mahesw...@huawei.com>
Subject RE: ZooKeeper Session Expiration
Date Wed, 02 May 2012 11:59:54 GMT
I have seen session expire events mainly when we unplug nw  from the node to ZK servers ( mainly
i have seen when we are developing failover controller fw with ZK). This may not be usual
scenario, but it can happen.



Coming to BK case, when we loos the zk handle connectivity, simply replacing may not be possible
always because, we will not be sure when exactly we can create new connection with ZK back.

So, may be the fix could be that BK clients can throw the exception as they can not serve
when ZK is not availble. Let the application take actions?



Regards,

Uma

________________________________
From: Flavio Junqueira [fpj@yahoo-inc.com]
Sent: Tuesday, May 01, 2012 6:59 PM
To: bookkeeper-user@zookeeper.apache.org
Subject: Re: ZooKeeper Session Expiration

I don't know if this is your case, but we have seen in the past with zookeeper such issues
caused by GC pauses. I remember one case with hbase, and I think it is this one:

https://issues.apache.org/jira/browse/HBASE-1316

We have seen zookeeper clusters serving thousands of clients, so ~100 shouldn't be a problem.
Still session expiration is part of zookeeper, so we need to deal with here as well.

-Flavio

On May 1, 2012, at 3:14 PM, John Nagro wrote:

Flavio -

We're trying to get to the bottom of it. As I understand it, in a properly configured and
operating Zk Cluster we should never see a session expiration exception. Globally (including
all systems) we see them perhaps once a week for the last month - and it causes some issues
in our system. We saw one last night, and bookkeeper had an issue a couple days ago.

We do have a lot of nodes connecting to zookeeper for various things. We have a home-built
configuration management tool that uses zk as the data store, the bookkeeper stuff obviously
does, my coordination on top of the bookkeeper ledgers uses it, etc. So yes, lots of machines
(dozens up to ~100) talk to this zk cluster in some fashion or another - we have other clusters
too. Ultimately, more machines will talk to the configuration stuff in the long term. I could
potentially move my zk stuff off that cluster if you think it would help.

-John

On Tue, May 1, 2012 at 8:44 AM, Flavio Junqueira <fpj@yahoo-inc.com<mailto:fpj@yahoo-inc.com>>
wrote:
This is definitely not ideal. If you lose your zookeeper session, then you're not able to
close your open ledgers, which will force ledger recovery. It is not a correctness issue,
but certainly inconvenient. We need to fix, and I'm glad that Uma is already looking into
it.

I'm curious about why you're getting session expirations, though. Is it frequent or you got
it once? Do you have many nodes connecting to your ZooKeeper instance?

-Flavio


On May 1, 2012, at 2:07 PM, John Nagro wrote:

Thanks Uma - that is exactly what i am looking for. The way i am handing it now is to pass
a bookkeeper client factory rather than an instance. When i encounter zk session expiration,
i create a new client and discard the old one - getting a fresh set of connections to zk.
Perhaps not idea, but gets the job done.

thanks!

-John

On Tue, May 1, 2012 at 12:09 AM, Uma Maheswara Rao G <maheswara@huawei.com<mailto:maheswara@huawei.com>>
wrote:
Hi John,

 BK client need to handle session expire events from ZK.  Here is the issue for that BOOKKEEPER-225<https://issues.apache.org/jira/browse/BOOKKEEPER-225>.
We will implement it soon. I hope this is your doubt. Please correct me if my interpretation
is wrong about your question here.

Thanks a lot,
Uma
________________________________
From: John Nagro [jnagro@hubspot.com<mailto:jnagro@hubspot.com>]
Sent: Tuesday, May 01, 2012 1:20 AM
To: bookkeeper-user@zookeeper.apache.org<mailto:bookkeeper-user@zookeeper.apache.org>
Subject: ZooKeeper Session Expiration

Hello -

If I start seeing ZKExceptions in the Bk Client, which appear to be due to SessionExpiration
errors... it seems that the BookKeeper client never recovers from that? Is that correct?

Thanks!

-John Nagro





flavio
junqueira
senior research scientist

fpj@yahoo-inc.com<mailto:fpj@yahoo-inc.com>
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Mime
View raw message