ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-10536) Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in inconsistent state, breaking HA and failover
Date Thu, 16 Apr 2015 16:31:59 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sekhon updated AMBARI-10536:
---------------------------------
    Description: 
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still doesn't work properly
as doing ha admin failover commands return similar exceptions complaining about this inconsistent
state.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn is
in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback
first.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still doesn't work properly
as doing ha admin failover commands return similar exceptions fail complaining about this
inconsistent state.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


> Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in inconsistent state,
breaking HA and failover
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-10536
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10536
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server, stacks
>    Affects Versions: 2.0.0
>         Environment: HDP 2.2.0.0 <= rollback <= 2.2.4.0
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: broken-namenode-nn1.log.bz2, remaining-namenode-nn2.log.bz2
>
>
> After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and subsequent rollback,
Ambari 2.0 leaves one of the HDFS HA NameNodes in an inconsistent state:
> {code}2015-04-16 11:45:38,231 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(138))
- Start loading edits file http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream (RedundantEditLogInputStream.java:nextOp(176))
- Fast-forwarding stream 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
> 2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(238))
- Encountered exception on operation RollingUpgradeOp [START, time=1429181084342]
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn
is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or
rollback first.
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/nn
is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or
rollback first.
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting
with status 1
> 2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
> ************************************************************/{code}
> The NameNode was shut down as a result, and after restarting it, it still doesn't work
properly as doing ha admin failover commands return similar exceptions complaining about this
inconsistent state.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message