hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node
Date Wed, 29 Oct 2014 18:05:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188700#comment-14188700

Hadoop QA commented on HDFS-7097:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision ec63a3f.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:


                                      The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:


    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8580//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8580//console

This message is automatically generated.

> Allow block reports to be processed during checkpointing on standby name node
> -----------------------------------------------------------------------------
>                 Key: HDFS-7097
>                 URL: https://issues.apache.org/jira/browse/HDFS-7097
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch
> On a reasonably busy HDFS cluster, there are stream of creates, causing data nodes to
generate incremental block reports.  When a standby name node is checkpointing, RPC handler
threads trying to process a full or incremental block report is blocked on the name system's
{{fsLock}}, because the checkpointer acquires the read lock on it.  This can create a serious
problem if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100 handlers, it
only takes 34 file creates.  If a separate service RPC port is not used, HA transition will
have to wait in the call queue for minutes. Even if a separate service RPC port is configured,
hearbeats from datanodes will be blocked. A standby NN  with a big name space can lose all
data nodes after checkpointing.  The rpc calls will also be retransmitted by data nodes many
times, filling up the call queue and potentially causing listen queue overflow.
> Since block reports are not modifying any state that is being saved to fsimage, I propose
letting them through during checkpointing. 

This message was sent by Atlassian JIRA

View raw message