hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
Date Sun, 29 Oct 2017 22:21:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224231#comment-16224231

Konstantin Shvachko commented on HDFS-11384:

Hey [~ywskycn] we want to disperse the initial RPC. Once they are dispersed the rest of them
should follow the pattern. Therefore we do not need to delay {{dispatchBlocks(0)}} when we
start reusing the threads {{j >= concurrentThreads}}.
As [explained in this comment|https://issues.apache.org/jira/browse/HDFS-11384?focusedCommentId=15961620&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15961620].
Hope this makes sense.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength
> -------------------------------------------------------------------------------------------------
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: Konstantin Shvachko
>             Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>         Attachments: HDFS-11384-007.patch, HDFS-11384-branch-2.7.011.patch, HDFS-11384-branch-2.8.011.patch,
HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, HDFS-11384.004.patch, HDFS-11384.005.patch,
HDFS-11384.006.patch, HDFS-11384.008.patch, HDFS-11384.009.patch, HDFS-11384.010.patch, HDFS-11384.011.patch,
balancer.day.png, balancer.week.png
> When running balancer on hadoop cluster which have more than 3000 Datanodes will cause
NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster
failure due to RegionServer's WAL timeout.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message