hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
Date Wed, 10 Sep 2014 19:43:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128983#comment-14128983
] 

Hadoop QA commented on HDFS-6621:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12667795/HDFS-6621_problem1.patch
  against trunk revision b67d5ba.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in hadoop-hdfs-project/hadoop-hdfs


    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7987//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7987//console

This message is automatically generated.

> Hadoop Balancer prematurely exits iterations
> --------------------------------------------
>
>                 Key: HDFS-6621
>                 URL: https://issues.apache.org/jira/browse/HDFS-6621
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 2.2.0, 2.4.0
>         Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0
>            Reporter: Benjamin Bowman
>              Labels: balancer
>         Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, HDFS-6621.patch_4,
HDFS-6621_problem1.patch
>
>
> I have been having an issue with the balancing being too slow.  The issue was not with
the speed with which blocks were moved, but rather the balancer would prematurely exit out
of it's balancing iterations.  It would move ~10 blocks or 100 MB then exit the current iteration
(in which it said it was planning on moving about 10 GB). 
> I looked in the Balancer.java code and believe I found and solved the issue.  In the
dispatchBlocks() function there is a variable, "noPendingBlockIteration", which counts the
number of iterations in which a pending block to move cannot be found.  Once this number gets
to 5, the balancer exits the overall balancing iteration.  I believe the desired functionality
is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon
block moves.  So once this number reaches 5 - even if there have been thousands of blocks
moved in between these no pending block iterations  - the overall balancing iteration will
prematurely end.  
> The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found
and scheduled.  In this way, my iterations do not prematurely exit unless there is 5 consecutive
no pending block iterations.   Below is a copy of my dispatchBlocks() function with the change
I made.
> {code}
>     private void dispatchBlocks() {
>       long startTime = Time.now();
>       long scheduledSize = getScheduledSize();
>       this.blocksToReceive = 2*scheduledSize;
>       boolean isTimeUp = false;
>       int noPendingBlockIteration = 0;
>       while(!isTimeUp && getScheduledSize()>0 &&
>           (!srcBlockList.isEmpty() || blocksToReceive>0)) {
>         PendingBlockMove pendingBlock = chooseNextBlockToMove();
>         if (pendingBlock != null) {
>           noPendingBlockIteration = 0;
>           // move the block
>           pendingBlock.scheduleBlockMove();
>           continue;
>         }
>         /* Since we can not schedule any block to move,
>          * filter any moved blocks from the source block list and
>          * check if we should fetch more blocks from the namenode
>          */
>         filterMovedBlocks(); // filter already moved blocks
>         if (shouldFetchMoreBlocks()) {
>           // fetch new blocks
>           try {
>             blocksToReceive -= getBlockList();
>             continue;
>           } catch (IOException e) {
>             LOG.warn("Exception while getting block list", e);
>             return;
>           }
>         } else {
>           // source node cannot find a pendingBlockToMove, iteration +1
>           noPendingBlockIteration++;
>           // in case no blocks can be moved for source node's task,
>           // jump out of while-loop after 5 iterations.
>           if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) {
>             setScheduledSize(0);
>           }
>         }
>         // check if time is up or not
>         if (Time.now()-startTime > MAX_ITERATION_TIME) {
>           isTimeUp = true;
>           continue;
>         }
>         /* Now we can not schedule any block to move and there are
>          * no new blocks added to the source block list, so we wait.
>          */
>         try {
>           synchronized(Balancer.this) {
>             Balancer.this.wait(1000);  // wait for targets/sources to be idle
>           }
>         } catch (InterruptedException ignored) {
>         }
>       }
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message