Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 7 Aug 2015 18:42:45 +0000 (UTC)
From: "Ming Ma (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12853013.1438969822000.7953.1438972965996@Atlassian.JIRA>
In-Reply-To: <JIRA.12853013.1438969822000@Atlassian.JIRA>
References: <JIRA.12853013.1438969822000@Atlassian.JIRA>
 <JIRA.12853013.1438969822642@arcas>
Subject: [jira] [Commented] (HDFS-8875) Optimize the wait time in Balancer
 for federation scenario
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662244#comment-14662244 ] 

Ming Ma commented on HDFS-8875:
-------------------------------

bq. For the Collections.shuffle(connectors); call: I can see this being advantageous in the scenario where the balancer is constantly behind. With the shuffle, you won't always start with the same namespace.

I thought the balancer is going to wait until one namespace finish moving before going to the next namespace/iteration, no?

bq. Even with federation, we still might run into the case where we would want to sleep between iterations
Agree. I didn't mean to get rid of the wait time. The optimization could be like what you suggested.

Another thing is if we should add parallelism for different namespaces.

> Optimize the wait time in Balancer for federation scenario
> ----------------------------------------------------------
>
>                 Key: HDFS-8875
>                 URL: https://issues.apache.org/jira/browse/HDFS-8875
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Chris Trezzo
>
> Balancer has wait time between two consecutive iterations. That is to give some time for block movement to be fully committed ( return from replaceBlock doesn't mean the NN's blockmap has been updated and the block has been invalidated on the source node.).
> This wait time could be 23 seconds if {{dfs.heartbeat.interval}} is set to 10 and {{dfs.namenode.replication.interval}} is to 3. In the case of federation, given we iterate through all namespaces in each iteration, this wait time becomes unnecessary as while balancer is processing the next namespace, it gives the previous namespace it just finished time to commit.
> In addition, Balancer calls {{Collections.shuffle(connectors);}} It doesn't seem necessary.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)