hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7742) favoring decommissioning node for replication can cause a block to stay underreplicated for long periods
Date Sat, 28 Mar 2015 00:50:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384994#comment-14384994

Hadoop QA commented on HDFS-7742:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision 05499b1.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:


Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10095//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10095//console

This message is automatically generated.

> favoring decommissioning node for replication can cause a block to stay underreplicated
for long periods
> --------------------------------------------------------------------------------------------------------
>                 Key: HDFS-7742
>                 URL: https://issues.apache.org/jira/browse/HDFS-7742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: HDFS-7742-v0.patch
> When choosing a source node to replicate a block from, a decommissioning node is favored.
The reason for the favoritism is that decommissioning nodes aren't servicing any writes so
in-theory they are less loaded.
> However, the same selection algorithm also tries to make sure it doesn't get "stuck"
on any particular node:
> {noformat}
>       // switch to a different node randomly
>       // this to prevent from deterministically selecting the same node even
>       // if the node failed to replicate the block on previous iterations
> {noformat}
> Unfortunately, the decommissioning check is prior to this randomness so the algorithm
can get stuck trying to replicate from a decommissioning node. We've seen this in practice
where a decommissioning datanode was failing to replicate a block for many days, when other
viable replicas of the block were available.
> Given that we limit the number of streams we'll assign to a given node (default soft
limit of 2, hard limit of 4), It doesn't seem like favoring a decommissioning node has significant
benefit. i.e. when there is significant replication work to do, we'll quickly hit the stream
limit of the decommissioning nodes and use other nodes in the cluster anyway; when there isn't
significant replication work then in theory we've got plenty of replication bandwidth available
so choosing a decommissioning node isn't much of a win.
> I see two choices:
> 1) Change the algorithm to still favor decommissioning nodes but with some level of randomness
that will avoid always selecting the decommissioning node
> 2) Remove the favoritism for decommissioning nodes
> I prefer #2. It simplifies the algorithm, and given the other throttles we have in place,
I'm not sure there is a significant benefit to selecting decommissioning nodes. 

This message was sent by Atlassian JIRA

View raw message