hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen O'Donnell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14576) Avoid block report retry and slow down namenode startup
Date Mon, 17 Jun 2019 19:24:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865911#comment-16865911
] 

Stephen O'Donnell commented on HDFS-14576:
------------------------------------------

As [~jojochuang] mentioned, we used to see a lot of issues like this, but in later CDH versions
several patches have been backported that made the initial block report problem largely disappear.
 Unfortunately I don't have the list of Jiras and their relative impact.

Have you investigated using dfs.blockreport.initialDelay for the datanodes? I believe that
will cause the datanode to delay its initial block report by a random interval between zero
and that setting. If you know your average startup time for the cluster, perhaps you could
set that value to something close to the average startup time and then hopefully the DNs would
send their initial block reports over that interval rather than all at once, spreading the
load more evenly.

For info, what version of HDFS are you running where you see these problems?

> Avoid block report retry and slow down namenode startup
> -------------------------------------------------------
>
>                 Key: HDFS-14576
>                 URL: https://issues.apache.org/jira/browse/HDFS-14576
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> During namenode startup, the load will be very high since it has to process every datanodes
blockreport one by one. If there are hundreds datanodes block reports pending process, the
issue will be more serious even #processFirstBlockReport is processed a lot more efficiently
than ordinary block reports. Then some of datanode will retry blockreport and lengthens restart
times. I think we should filter the block report request (via datanode blockreport retries)
which has be processed and return directly then shorten down restart time. I want to state
this proposal may be obvious only for large cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message