hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-8050) Deadlock in metrics
Date Fri, 10 Feb 2012 20:44:59 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050.patch.txt

If a lot of methods are synchronized and two classes containing them have interdependency,
deadlock is likely.

The current way of locking in metrics is a little excessive. I do not believe the strict global
consistency is required in processing metrics. For one, sources are not cordinating with each
other (they are mostly independent), so locking the whole subsystem and taking snapshot does
not add much value to the quality of data. 

This patch removes some locks around accessing the source adapter map within MetricsSystemImpl.
This makes the metric snapshot only lock on each individual source adapter, one at a time,
instead of the entire metrics impl.  This is safe because:

* Once sources are registered, they are not removed until shutdown(). Even shoutdown() or
stop() is called rarely.

* During snapshot, the source adapter hashmap is the only data structure that needs protection.

* snapshot() is only called from the timer event handler. startTimer() makes sure that there
is only one timer.

I wrapped the LinkeHashMap used for the source adapter map with Collections.synchronizedMap.
This made accessing the data structure safe without holding a big coarse lock. No further
synchronization between sources seem needed.

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the
web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen
there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message