hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10966) Enhance Dispatcher logic on deciding when to give up a source DataNode
Date Sat, 19 Nov 2016 00:22:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15678192#comment-15678192
] 

Zhe Zhang edited comment on HDFS-10966 at 11/19/16 12:22 AM:
-------------------------------------------------------------

Thanks Kihwal for the review! Uploading a new patch:
# This patch does change the Balancer behavior introduced by HDFS-4261, around the timeout
logic. But I don't think there's a negative effect. By staying in the {{dispatchBlocks}} while
loop longer, the only overhead is to check {{chooseNextMove}}, which only checks local states,
without issuing NameNode workload. Even if we jump out of the while loop, the thread for that
Source cannot be reused at another Source anyway. In {{TestBalancer}} I reset the config value
to 5s, and the run time is normal.
# Added to {{hdfs-default.xml}}, thx for the catch.
# I think it is a good idea, added.


was (Author: zhz):
Thanks Kihwal for the review! Uploading a new patch:
# This patch does change the Balancer behavior introduced by HDFS-4261, around the timeout
logic. But I don't think there's a negative effect. By staying in the {{dispatchBlocks}} while
loop longer, the only overhead is to check {{chooseNextMove}}, which only checks local states,
without issuing NameNode workload. Even if we jump out of the while loop, the thread for that
Source cannot be reused at another Source anyway.
# Added to {{hdfs-default.xml}}, thx for the catch.
# I think it is a good idea, added.

> Enhance Dispatcher logic on deciding when to give up a source DataNode
> ----------------------------------------------------------------------
>
>                 Key: HDFS-10966
>                 URL: https://issues.apache.org/jira/browse/HDFS-10966
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>            Reporter: Zhe Zhang
>            Assignee: Mark Wagner
>         Attachments: HDFS-10966.00.patch, HDFS-10966.01.patch
>
>
> When a {{Dispatcher}} thread works on a source DataNode, in each iteration it tries to
execute a {{PendingMove}}. If no block is moved after 5 iterations, this source (over-utlized)
DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source
DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5
unsuccessful moves and be abandoned.
> We should enhance this logic by e.g. using elapsed time instead of number of iterations.
> {code}
> // Check if the previous move was successful
>         } else {
>           // source node cannot find a pending block to move, iteration +1
>           noPendingMoveIteration++;
>           // in case no blocks can be moved for source node's task,
>           // jump out of while-loop after 5 iterations.
>           if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
>             LOG.info("Failed to find a pending move "  + noPendingMoveIteration
>                 + " times.  Skipping " + this);
>             resetScheduledSize();
>           }
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message