Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 1 Mar 2017 20:17:45 +0000 (UTC)
From: "Benoy Antony (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13039581.1485980837000.35250.1488399465588@Atlassian.JIRA>
In-Reply-To: <JIRA.13039581.1485980837000@Atlassian.JIRA>
References: <JIRA.13039581.1485980837000@Atlassian.JIRA> <JIRA.13039581.1485980837296@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11384) Add option for balancer to disperse
 getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 01 Mar 2017 20:17:50 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890957#comment-15890957 ] 

Benoy Antony commented on HDFS-11384:
-------------------------------------

Sleeping inside the *Synchronized* block should be avoided as it will lock prevent other threads from obtaining the lock while the thread is sleeping. 
One tradeoff in sleeping fixed vs variable time is that code gets complicated. Since by default, the delay is not applied, it is okay to sleep for a fixed interval after getBlocks(). 

> Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>         Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes will cause NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster failure due to RegionServer's WAL timeout.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org