ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-10456) Ambari Server Deadlock When Mapping Hosts
Date Tue, 14 Apr 2015 02:45:12 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-10456:
-------------------------------------
    Description: 
When mapping hosts concurrently with reading information from a cluster, there was a deadlock
between the building the cluster health report and mapping the new hosts. 

A few changes to note here:

- ClustersImpl uses concurrent maps; there's really no need to keep the internal lock. I removed
it in several places where the cluster is guaranteed to be available (such as when using the
ID to retrieve the cluster). The concurrent maps guard against concurrent modifications.

- The Ambari Event Publisher was actually synchronous. This not only caused bottlenecks, but
also contributed to a secondary deadlock detected while fixing the original issue. It was
changed into a single-threaded asynchronous bus. Consumers of this bus should never rely on
it to perform its actions in order to perform their own logic, so changing the behavior seemed
correct

  was:When mapping hosts concurrently while getting clusters, there's a deadlock that can
occur between {{ClustersImpl}} and {{ClusterImpl}}.


> Ambari Server Deadlock When Mapping Hosts
> -----------------------------------------
>
>                 Key: AMBARI-10456
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10456
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.1.0
>
>         Attachments: dump.txt
>
>
> When mapping hosts concurrently with reading information from a cluster, there was a
deadlock between the building the cluster health report and mapping the new hosts. 
> A few changes to note here:
> - ClustersImpl uses concurrent maps; there's really no need to keep the internal lock.
I removed it in several places where the cluster is guaranteed to be available (such as when
using the ID to retrieve the cluster). The concurrent maps guard against concurrent modifications.
> - The Ambari Event Publisher was actually synchronous. This not only caused bottlenecks,
but also contributed to a secondary deadlock detected while fixing the original issue. It
was changed into a single-threaded asynchronous bus. Consumers of this bus should never rely
on it to perform its actions in order to perform their own logic, so changing the behavior
seemed correct



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message