hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6523) HConnectionImplementation still does not recover from all ZK issues.
Date Wed, 08 Aug 2012 17:40:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431238#comment-13431238
] 

nkeywal commented on HBASE-6523:
--------------------------------

I agree, Zookeeper list comes handy for these questions :-).

To me, to be validated by ZK experts, ConnectionLoss means that we lost the connection, but
we hope it will come back. When it comes back, we receive all the events, and there should
be no data loss. While for a SessionTimeout, we may have lost events, so we should re-initiate
the watchers and, from an application point of view, take into account that we may have missed
events in the middle.

The way we manage session timeouts in HBase/RecoverableZK is tricky: we retry, because we
expect that a parallel abort will have triggered a zk session recreation, so our next retry
will be on a brand new ZK session (and ZooKeeper object in the RecoverableZK ) and so it will
work.

As we retry a limited amount of time in the RecovableZK, for connectionLoss we may stop to
retry before the timeout is happening, and throw the exception to the calling layer. As such
it may becoming a unrecovable error from an HBase point of view. I think that if we want to
fix this we should change RecoverableZooKeeper to make it retry all the time for a connectionLoss,
waiting for the session timeout to occur. May be as well we have calls not using the recovable
ZK (if I'm remember well I've seen a few, and is was justified I believe). But we should not
re create a session for a connection loss (it could have bad side effects with ZK having to
manage too many sessions, the old and the new, for example).

                
> HConnectionImplementation still does not recover from all ZK issues.
> --------------------------------------------------------------------
>
>                 Key: HBASE-6523
>                 URL: https://issues.apache.org/jira/browse/HBASE-6523
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6523.txt
>
>
> During some testing here at Salesforce.com we found another scenario where an HConnectionImplementation
would never recover from a lost ZK connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message