hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chackaravarthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10365) FullBlockReports retransmission delays NN startup time in large cluster.
Date Wed, 04 May 2016 19:20:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271252#comment-15271252

Chackaravarthy commented on HDFS-10365:

Thanks [~cnauroth] for the response. These fixes seems relevant to resolve the issue which
we are facing currently. We will see if we can backport these fixes.

As a quick fix to handle in 2.6.0, do you think this can be solved by tuning any config? And
is there any guideline to set service handler count depending upon cluster size?

> FullBlockReports retransmission delays NN startup time in large cluster.
> ------------------------------------------------------------------------
>                 Key: HDFS-10365
>                 URL: https://issues.apache.org/jira/browse/HDFS-10365
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0
>         Environment: version - hadoop-2.6.0 (hdp-2.2)
> DN - 1200 nodes
>            Reporter: Chackaravarthy
>            Priority: Critical
> Whenever NN is restarted, it takes huge time for NN to come back to stable state. i.e.
Last contact time remains more than 1 or 2 mins continuously for around 3 to 4 hours. This
is mainly because most of the DN's getting timeout (60s) in blockReport (FBR) rpc call and
then it keep sending FBR again.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message