ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-19289) HDFS Service check fails if previous active NN is down
Date Fri, 23 Dec 2016 08:58:58 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772351#comment-15772351
] 

Hadoop QA commented on AMBARI-19289:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12844529/AMBARI-19289_trunk.01.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in ambari-server 

Test results: https://builds.apache.org/job/Ambari-trunk-test-patch/9808//testReport/
Console output: https://builds.apache.org/job/Ambari-trunk-test-patch/9808//console

This message is automatically generated.

> HDFS Service check fails if previous active NN is down
> ------------------------------------------------------
>
>                 Key: AMBARI-19289
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19289
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.4.2
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>         Attachments: AMBARI-19289_trunk.01.patch
>
>
> *Reproduce steps*
> # Enable namenode HA
> # Shutdown the active namenode, standby takes over
> # Run HDFS service check
> hdfs service check script uses
> {{hdfs dfsadmin -fs hdfs://mycluster -safemode get | grep OFF}}
> to check if namenode is out of safemode. However this command will fail if 1st NN is
down without checking the state of 2nd NN. This is likely a HDFS bug similar to HDFS-8277.
> *Proposal*
> There are several approaches to fix this
> # Loop each namenode address and get safemode with {{hdfs dfsadmin -fs hdfs://nn_host:8020
-safemode get | grep OFF}}, as long as there is one NN returns OFF, consider DFS is not in
safemode and continue the rest of check. However is it really necessary to add such complexity
for service check?
> # Remove the safemode check code, if HDFS is in safemode, read/write operations will
fail anyway so service check won't pass
> I am preferring to #2 because it makes script simpler and work in all cases. Note this
is service check, it should pass as long as HDFS is in working state. It is not namenode check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message