ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-10536) Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in inconsistent state, breaking HA and failover
Date Thu, 16 Apr 2015 16:32:58 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sekhon updated AMBARI-10536:
---------------------------------
    Description: 
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still doesn't work properly
as doing ha admin failover commands return similar exceptions complaining about this inconsistent
state, which should be visible in the NameNode logs I've uploaded.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still doesn't work properly
as doing ha admin failover commands return similar exceptions complaining about this inconsistent
state.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


> Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in inconsistent state,
breaking HA and failover
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-10536
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10536
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server, stacks
>    Affects Versions: 2.0.0
>         Environment: HDP 2.2.0.0 <= rollback <= 2.2.4.0
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: broken-namenode-nn1.log.bz2, remaining-namenode-nn2.log.bz2
>
>
> After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
> {code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
> 2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn
is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or
rollback first.
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn
is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or
rollback first.
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting
with status 1
> 2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
> ************************************************************/{code}
> The NameNode was shut down as a result, and after restarting it, it still doesn't work
properly as doing ha admin failover commands return similar exceptions complaining about this
inconsistent state, which should be visible in the NameNode logs I've uploaded.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message