hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9183) Potential deadlock in ActiveStandbyElector
Date Mon, 07 Jan 2013 17:20:12 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom White updated HADOOP-9183:
------------------------------

    Attachment: HADOOP-9183.patch

This patch fixes the problem by making two changes. First, the queue of events in WatcherWithClientRef
is dispensed with, and instead the process method blocks until the ZK object is set back on
the watcher. This should be acceptable since the set operation is a simple method call, so
there is minimal overhead. Second, the locking order ActiveStandbyElector -> WatcherWithClientRef
is enforced, to prevent cycles.

Note also that the CountDownLatch can safely have its countDown() method called outside the
synchronized section (which is to protect the ZK field). Indeed it must, since getNewZooKeeper
is holding the ActiveStandbyElector object lock while it waits for the ZK connection event.
This means that the event cannot be processed until the lock is released (this is the current
behaviour today), but we need to signal that the connect event was received.
                
> Potential deadlock in ActiveStandbyElector
> ------------------------------------------
>
>                 Key: HADOOP-9183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9183
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.0.2-alpha
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: 2_jcarder_result_1.png, 3_jcarder_result_0.png, HADOOP-9183.patch
>
>
> A jcarder run found a potential deadlock in the locking of ActiveStandbyElector and ActiveStandbyElector.WatcherWithClientRef.
No deadlock has been seen in practice, this is just a theoretical possibility at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message