Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 25 Apr 2017 02:02:04 +0000 (UTC)
From: "Konstantin Shvachko (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13039581.1485980837000.40563.1493085724214@Atlassian.JIRA>
In-Reply-To: <JIRA.13039581.1485980837000@Atlassian.JIRA>
References: <JIRA.13039581.1485980837000@Atlassian.JIRA> <JIRA.13039581.1485980837296@jira-lw-us.apache.org>
Subject: [jira] [Updated] (HDFS-11384) Add option for balancer to disperse
 getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 25 Apr 2017 02:02:42 -0000


     [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HDFS-11384:
---------------------------------------
    Attachment: HDFS-11384.008.patch

# Took some time to reproduce failures. I did not have any on my local box.
Looks like the solution is to mock FSNamesystem before starting DataNodes. Otherwise the behavior is non-deterministic. I changed it and now it runs consistently on my local box. Let's try Jenkins.
# findbugs warnings are not related to the patch.
# There are 2 checkstyle warnings.
#* One complains that the number of parameters in doTest() is more than 7. Don't know why the magical number, but there was 8 parameters in doTest() already and I added one.
#* Second is about inner assignment, which is intentional in this case, because I want the two variables initially have the same value, and splitting the line into two statements would remove that meaning.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: Konstantin Shvachko
>         Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, HDFS-11384.004.patch, HDFS-11384.005.patch, HDFS-11384.006.patch, HDFS-11384-007.patch, HDFS-11384.008.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes will cause NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster failure due to RegionServer's WAL timeout.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org