hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10716) In Balancer, the target task should be removed when its size < 0.
Date Thu, 05 Jan 2017 01:20:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799925#comment-15799925
] 

Junping Du commented on HDFS-10716:
-----------------------------------

Sound like we were missing jira number in commit log.
{noformat}
commit cefa21e98a12b06602ee8000f8cef6c3b17af999
Author: Tsz-Wo Nicholas Sze <szetszwo@hortonworks.com>
Date:   Thu Aug 4 09:45:40 2016 -0700

    In Balancer, the target task should be removed when its size < 0.  Contributed by Yiqun
Lin
{noformat}

> In Balancer, the target task should be removed when its size < 0.
> -----------------------------------------------------------------
>
>                 Key: HDFS-10716
>                 URL: https://issues.apache.org/jira/browse/HDFS-10716
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Minor
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
>         Attachments: HDFS-10716.001.patch, failing.log
>
>
> In HDFS-10602, we found a failing case that the balancer moves data always between 2
DNs. And it made the balancer can't be finished. I debug the code for this, I found there
seems a bug in choosing pending blocks in {{Dispatcher.Source.chooseNextMove}}.
> The codes:
> {code}
>     private PendingMove chooseNextMove() {
>       for (Iterator<Task> i = tasks.iterator(); i.hasNext();) {
>         final Task task = i.next();
>         final DDatanode target = task.target.getDDatanode();
>         final PendingMove pendingBlock = new PendingMove(this, task.target);
>         if (target.addPendingBlock(pendingBlock)) {
>           // target is not busy, so do a tentative block allocation
>           if (pendingBlock.chooseBlockAndProxy()) {
>             long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
>             incScheduledSize(-blockSize);
>             task.size -= blockSize;
>             // If the size of bytes that need to be moved was first reduced to less than
0
>             // it should also be removed.
>             if (task.size == 0) {
>               i.remove();
>             }
>             return pendingBlock;
>             //...
> {code}
> The value of task.size was assigned in {{Balancer#matchSourceWithTargetToMove}}
> {code}
>     long size = Math.min(source.availableSizeToMove(), target.availableSizeToMove());
>     final Task task = new Task(target, size);
> {code}
> This value was depended on the source and target node, and this value will not always
can be reduced to 0 in choosing pending blocks. And then, it will still move the data to the
target node even if the size of bytes that needed to move has been already reduced less than
0. And finally it will make the data imbalance again in cluster, then it leads the next balancer.
> We can opitimize for this as this title mentioned, I think this can speed the balancer.
> Can see the logs for failling case, or see the HDFS-10602.(Concentrating on the change
record for the scheduled size of target node. That's my added info for debug, like this).
> {code}
> 2016-08-01 16:51:57,492 [pool-51-thread-1] INFO  balancer.Dispatcher (Dispatcher.java:chooseNextMove(799))
- TargetNode: 58794, bytes scheduled to move, after: -67, before: 33
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message