flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chesnay Schepler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-10761) MetricGroup#getAllVariables can deadlock
Date Fri, 02 Nov 2018 12:24:00 GMT
Chesnay Schepler created FLINK-10761:

             Summary: MetricGroup#getAllVariables can deadlock
                 Key: FLINK-10761
                 URL: https://issues.apache.org/jira/browse/FLINK-10761
             Project: Flink
          Issue Type: Bug
          Components: Metrics
    Affects Versions: 1.6.2, 1.5.5, 1.7.0
            Reporter: Chesnay Schepler
            Assignee: Chesnay Schepler
             Fix For: 1.5.6, 1.6.3, 1.7.0

{{AbstractMetricGroup#getAllVariables}} acquires the locks of both the current and all parent
groups when assembling the variables map. This can lead to a deadlock if metrics are registered
concurrently on a child and parent if the child registration is applied first and the reporter
uses said method (which many do).

Assume we have a MetricGroup Mc(hild) and Mp(arent).

2 separate threads Tc and Tp each register a metric on their respective group, acquiring the
Let's assume that Tc has a slight headstart.
Tc will now call {{MetricRegistry#register}} first, acquiring the MR lock.
Tp will block on this lock.

Tc now iterates over all reporters calling {{MetricReporter#notifyOfAddedMetric}}. Assume
that in this method {{MetricGroup#getAllVariables}} is called on Mc by Tc.
Tc still holds the lock to Mc, and attempts to acquire the lock to Mp.
The lock to Mp is still held by Tp however, which waits for the MR lock to be released by

Thus a deadlock is created. This may deadlock anything, be it minor threads, tasks, or entire

This has not surfaced so far since usually metrics are no longer added to a group once children
have been created (since the component initialization at that point is complete).

This message was sent by Atlassian JIRA

View raw message