hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11507) NetworkTopology#chooseRandom may run into a dead loop due to race condition
Date Tue, 07 Mar 2017 20:59:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900143#comment-15900143

Chen Liang commented on HDFS-11507:

I found that there is global locking *before and after* entering chooseRandom call. In which
case chooseRandom is already synchronized with node add/remove. I was confused by acquiring
the lock again in {{countNumOfAvailableNodes}} and thought this is the first time the lock
is acquired. Now this race condition does not exist. Close this JIRA as not a problem.

> NetworkTopology#chooseRandom may run into a dead loop due to race condition
> ---------------------------------------------------------------------------
>                 Key: HDFS-11507
>                 URL: https://issues.apache.org/jira/browse/HDFS-11507
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Chen Liang
>            Assignee: Chen Liang
> {{NetworkTopology#chooseRandom()}} works as:
> 1. counts the number of available nodes as {{availableNodes}},
> 2. checks how many nodes are excluded, deduct from {{availableNodes}}
> 3. if {{availableNodes}} still > 0, then there are nodes available.
> 4. keep looping to find that node
> But now imagine, in the meantime, the actually available nodes got removed in step 3
or step 4, and all remaining nodes are excluded nodes. Then, although there are no more nodes
actually available, the code would still run as {{availableNodes}} > 0, and then it would
keep getting excluded node and loop forever, as 
> {{if (excludedNodes == null || !excludedNodes.contains(ret))}} 
> will always be false.
> We may fix this by expanding the while loop to also include the {{availableNodes}} calculation.
Such that we re-calculate {{availableNodes}} every time it fails to find an available node.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message