Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 19 Apr 2017 22:19:41 +0000 (UTC)
From: "Riza Suminto (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13065233.1492639942000.323676.1492640381654@Atlassian.JIRA>
In-Reply-To: <JIRA.13065233.1492639942000@Atlassian.JIRA>
References: <JIRA.13065233.1492639942000@Atlassian.JIRA> <JIRA.13065233.1492639942199@jira-lw-us.apache.org>
Subject: [jira] [Updated] (HDFS-11684) Potential scalability bug on datanode
 recommission
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 19 Apr 2017 22:19:51 -0000


     [ https://issues.apache.org/jira/browse/HDFS-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Riza Suminto updated HDFS-11684:
--------------------------------
    Description: 
We are academic researcher working on scalability bug detection.

Upon analyzing Hadoop code, our static analysis program detect
potential scalability problem along this function calls:

DatanodeManager.refreshNodes() -> DatanodeManager.refreshDatanodes() -> DecommissionManager.stopDecommission() -> BlockManager.processExtraRedundancyBlocksOnInService()

DatanodeManager.refreshNodes() call namesystem.writeLock(),
which will lock FSNamesystem. While locking FSNamesystem, it does
nested loop over datanodes in DatanodeManager.refreshDatanodes() and
over datanode blocks in BlockManager.processExtraRedundancyBlocksOnInService().

Looking only at number of executed lines, refreshDatanodes will
execute N*(136*B+137) lines of codes in total, where N is number of
datanodes and B is number of blocks per node.

This is a potential scalability bug that may render FSNamesystem
unresponsive for long time if refreshNodes trigger recommissioning of
many fat datanodes, having large number of over replicated datablocks.

This bug also discussed in HDFS-10477, but still left unresolved.

In older hadoop version, our program also detect similar scalability
problem in decommissioning datanodes, which has been fixed on
HADOOP-4061. However, recommissioning still have scalabillity problem
in current version.

  was:
We are academic researcher working on scalability bug detection.

Upon analyzing Hadoop code, our static analysis program detect
potential scalability problem along this function calls:

> DatanodeManager.refreshNodes() -> DatanodeManager.refreshDatanodes() -> DecommissionManager.stopDecommission() -> BlockManager.processExtraRedundancyBlocksOnInService()

DatanodeManager.refreshNodes() call namesystem.writeLock(),
which will lock FSNamesystem. While locking FSNamesystem, it does
nested loop over datanodes in DatanodeManager.refreshDatanodes() and
over datanode blocks in BlockManager.processExtraRedundancyBlocksOnInService().

Looking only at number of executed lines, refreshDatanodes will
execute N*(136*B+137) lines of codes in total, where N is number of
datanodes and B is number of blocks per node.

This is a potential scalability bug that may render FSNamesystem
unresponsive for long time if refreshNodes trigger recommissioning of
many fat datanodes, having large number of over replicated datablocks.

This bug also discussed in HDFS-10477, but still left unresolved.

In older hadoop version, our program also detect similar scalability
problem in decommissioning datanodes, which has been fixed on
HADOOP-4061. However, recommissioning still have scalabillity problem
in current version.


> Potential scalability bug on datanode recommission
> --------------------------------------------------
>
>                 Key: HDFS-11684
>                 URL: https://issues.apache.org/jira/browse/HDFS-11684
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.3
>            Reporter: Riza Suminto
>
> We are academic researcher working on scalability bug detection.
> Upon analyzing Hadoop code, our static analysis program detect
> potential scalability problem along this function calls:
> DatanodeManager.refreshNodes() -> DatanodeManager.refreshDatanodes() -> DecommissionManager.stopDecommission() -> BlockManager.processExtraRedundancyBlocksOnInService()
> DatanodeManager.refreshNodes() call namesystem.writeLock(),
> which will lock FSNamesystem. While locking FSNamesystem, it does
> nested loop over datanodes in DatanodeManager.refreshDatanodes() and
> over datanode blocks in BlockManager.processExtraRedundancyBlocksOnInService().
> Looking only at number of executed lines, refreshDatanodes will
> execute N*(136*B+137) lines of codes in total, where N is number of
> datanodes and B is number of blocks per node.
> This is a potential scalability bug that may render FSNamesystem
> unresponsive for long time if refreshNodes trigger recommissioning of
> many fat datanodes, having large number of over replicated datablocks.
> This bug also discussed in HDFS-10477, but still left unresolved.
> In older hadoop version, our program also detect similar scalability
> problem in decommissioning datanodes, which has been fixed on
> HADOOP-4061. However, recommissioning still have scalabillity problem
> in current version.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org