hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8365) Duplicated ZK notifications cause Master abort (or other unknown issues)
Date Thu, 18 Apr 2013 18:30:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635488#comment-13635488

Jeffrey Zhong commented on HBASE-8365:

nodeDataChangeEvent only will give the latest data because it will not be able to read the
old data
ZooKeeper intentionally only sends out notifications without passing the original state which
triggers the notification. It relies on clients to fetch the latest state. In addition, ZooKeeper
watcher is one-time trigger which means it only fire once and client need re-setup watcher
on the same znode to get next notification.

In our case, from the log, the related updates with watcher set on the region are: 1) opening->opening
2) opening->failed_open 3) failed_open->offline 4) offline->opening

The first notification(when we got FAILED_OPEN) is triggered by the update of opening->opening.
When Master got the notification and znode was already changed to failed_open, that's the
first trace nodeDataChange. 

The thing puzzles me is that ZooKeeper watcher will reset up on failed_open state after receiving
the first failed_open and should only get more notifications when failed_open state changes.
While we still get one more failed_open later from the same znode and data has the same version
as we received the first notification. I guess we may trigger ZK client reads stale cache
data when the node state changes from failed_open -> offline OR race conditions in ZK side
to cause the dup notifications.

> Duplicated ZK notifications cause Master abort (or other unknown issues)
> ------------------------------------------------------------------------
>                 Key: HBASE-8365
>                 URL: https://issues.apache.org/jira/browse/HBASE-8365
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.6
>            Reporter: Jeffrey Zhong
>         Attachments: TestResult.txt
> The duplicated ZK notifications should happen in trunk as well. Since the way we handle
ZK notifications is different in trunk, we don't see the issue there. I'll explain later.
> The issue is causing TestMetaReaderEditor.testRetrying flaky with error message {code}reader:
count=2, t=null{code} A related link is at https://builds.apache.org/job/HBase-0.94/941/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
> The test case failure is due to an IllegalStateException and master is aborted so the
rest test cases also failed after testRetrying.
> Below are steps why the issue is happening(region fa0e7a5590feb69bd065fbc99c228b36 is
in interests):
> 1) Got first notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 17:39:01,197
> {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): Handling transition=RS_ZK_REGION_FAILED_OPEN,
server=janus.apache.org,42093,1365097126155, region=fa0e7a5590feb69bd065fbc99c228b36{code}
> In the step, AM tries to open the region on another RS in a separate thread
> 2) Got second notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 17:39:01,200

> {code}DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): Handling transition=RS_ZK_REGION_FAILED_OPEN,
server=janus.apache.org,42093,1365097126155, region=fa0e7a5590feb69bd065fbc99c228b36{code}
> 3) Later got opening notification event result from the step 1 at 2013-04-04 17:39:01,288

> {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): Handling transition=RS_ZK_REGION_OPENING,
server=janus.apache.org,54833,1365097126175, region=fa0e7a5590feb69bd065fbc99c228b36{code}
> Step 2 ClosedRegionHandler throws IllegalStateException because "Cannot transit it to
OFFLINE"(state is in opening from notification 3) and abort Master. This could happen in 0.94
because we handle notifications using executorService which opens the door to handle events
out of order through receive them in order of updates.
> I've confirmed that we don't have duplicated AM listeners and both events triggered by
same ZK data of exact same version. The issue can be reproduced once by running testRetrying
test case 20 times in a loop.
> There are several issues for the failure:
> 1) duplicated ZK notifications. Since ZK watcher is one time trigger, the duplicated
notification should not happen from the same data of the same version in the first place
> 2) ZooKeeper watcher handling is wrong in both 0.94 and trunk as following:
> a) 0.94 handle notifications in async way which may lead to handle notifications out
of order of the events happened
> b) In trunk, we handle ZK notifications synchronously which slows down other components
such as SSH, LogSplitting etc. because we have a single notification queue
> c) In trunk & 0.94, we could use stale event data because we have a long listener
list. ZK node state could have changed at the time when handling the event. If a listener
needs to act upon latest state, it should re-fetch the data to verify if the data triggered
the handler hasn't changed.
> Suggestions:
> For 0.94, we can bandit the CloseRegionHandler to pass in the expected ZK data version
to skip event handling on stale data with min impact.
> For trunk, I'll open an improvement JIRA on ZK notification handling to provide more
parallelism to handle unrelated notifications.
> For the duplicated ZK notifications, we need bring some ZK experters to take a look at
> Please let me know what you think or any better idea.
> Thanks!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message