hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
Date Tue, 25 Apr 2017 02:02:04 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantin Shvachko updated HDFS-11384:
    Attachment: HDFS-11384.008.patch

# Took some time to reproduce failures. I did not have any on my local box.
Looks like the solution is to mock FSNamesystem before starting DataNodes. Otherwise the behavior
is non-deterministic. I changed it and now it runs consistently on my local box. Let's try
# findbugs warnings are not related to the patch.
# There are 2 checkstyle warnings.
#* One complains that the number of parameters in doTest() is more than 7. Don't know why
the magical number, but there was 8 parameters in doTest() already and I added one.
#* Second is about inner assignment, which is intentional in this case, because I want the
two variables initially have the same value, and splitting the line into two statements would
remove that meaning.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength
> -------------------------------------------------------------------------------------------------
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: Konstantin Shvachko
>         Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch, HDFS-11384.002.patch,
HDFS-11384.003.patch, HDFS-11384.004.patch, HDFS-11384.005.patch, HDFS-11384.006.patch, HDFS-11384-007.patch,
> When running balancer on hadoop cluster which have more than 3000 Datanodes will cause
NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster
failure due to RegionServer's WAL timeout.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message