hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
Date Wed, 12 Apr 2017 02:10:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965266#comment-15965266
] 

Konstantin Shvachko commented on HDFS-11384:
--------------------------------------------

* I am usually very conservative about introducing new configuration parameters. Parameters
seem to give you flexibility to adjust them, but in many cases administrators don't know what
to do with that flexibility, because there so many of them. I prefer to have a reasonable
constant value initially, and add a config variable later if _other_ value are needed in certain
cases. In the end adding configs is easy, but you can never remove them.
In this particular case the BALANCER_NUM_RPC_PER_SEC is chosen so that big clusters would
distribute _initial_ RPC requests over 10 secs, and it does not effect small clusters at all.
I think we are good with the constant set to 20 for now, but let me know if you see use cases
for different values.
* Fixed the typo in 004 patch. Thanks [~zhz].
* This would be a typical misuse of Preconditions, as we do in many cases in the code, and
as it was discussed previously on many occasions. It is an assert, because we assume the condition
should never happen. If it does, it's a bug, which should be caught during testing, with {{-ea}}
option. And in the runtime we want to avoid checking any extra condition for performance reasons.

> Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength
spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>         Attachments: balancer.day.png, balancer.week.png, HDFS-11384.001.patch, HDFS-11384.002.patch,
HDFS-11384.003.patch, HDFS-11384.004.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes will cause
NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster
failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message