ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Onischuk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-12230) During HDP 2.1 to 2.2.6 upgrade dfs.journalnode.edits.dir is incorrectly changed
Date Wed, 01 Jul 2015 08:43:05 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Onischuk updated AMBARI-12230:
-------------------------------------
    Description: 
    2015-06-17 23:00:32,926 WARN ha.EditLogTailer (EditLogTailer.java:doWork(339)) - Edit
log tailer interrupted 
    java.lang.InterruptedException: sleep interrupted 
    at java.lang.Thread.sleep(Native Method) 
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)

    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)

    2015-06-17 23:00:32,930 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1152))
- Starting services required for active state 
    2015-06-17 23:00:32,946 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnfinalizedSegments(435))
- Starting recovery process for unclosed journal segments... 
    2015-06-17 23:00:32,963 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398))
- Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM
to [10.222.32.220:8485, 10.222.32.214:8485, 10.222.32.216:8485], stream=null)) 
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve
quorum size 2/3. 3 exceptions thrown: 
    10.222.32.220:8485: Journal Storage Directory /hadoop/hdfs/journalnode/preprod not formatted


  was:
PROBLEM: The customer was following the Ambari 2.0.1instructions for upgrading
the stack from HDP 2.1 to 2.2.6 found here:

<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>

When they tried to start the NN in section 3 (Complete the Upgrade), step 12
of those instructions it failed with the error

    
    
    2015-06-17 23:00:32,926 WARN ha.EditLogTailer (EditLogTailer.java:doWork(339)) - Edit
log tailer interrupted 
    java.lang.InterruptedException: sleep interrupted 
    at java.lang.Thread.sleep(Native Method) 
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)

    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)

    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)

    2015-06-17 23:00:32,930 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1152))
- Starting services required for active state 
    2015-06-17 23:00:32,946 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnfinalizedSegments(435))
- Starting recovery process for unclosed journal segments... 
    2015-06-17 23:00:32,963 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398))
- Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM
to [10.222.32.220:8485, 10.222.32.214:8485, 10.222.32.216:8485], stream=null)) 
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve
quorum size 2/3. 3 exceptions thrown: 
    10.222.32.220:8485: Journal Storage Directory /hadoop/hdfs/journalnode/preprod not formatted

    

BUSINESS IMPACT: Customer stuck during upgrade process. Attempting to roll
back will not work either.

SUPPORT ANALYSIS: The issue was caused by section 3, step 4 where they had to
run

    
    
    python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername
$CLUSTERNAME --fromStack=2.1 --toStack=2.2.x --upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json
update-configs
    

They had a custom path for dfs.journalnode.edits.dir set to
/data/hadoop/hdfs/journal. The above changed that to /hadoop/hdfs/journalnode
meaning the JNs thought they were not formatted properly. There was no
warnings in Ambari to indicate an issue when they started the JNs.

STEPS TO REPRODUCE:  
Starting with an HDP 2.1 Ambari installed cluster, change
dfs.journalnode.edits.dir from the default and set up NN HA. Then attempt to
follow upgrade instructions

<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>

to upgrade the HDP stack from 2.1 to 2.2.6.




> During HDP 2.1 to 2.2.6 upgrade dfs.journalnode.edits.dir is incorrectly changed
> --------------------------------------------------------------------------------
>
>                 Key: AMBARI-12230
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12230
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>             Fix For: 2.1.0
>
>
>     2015-06-17 23:00:32,926 WARN ha.EditLogTailer (EditLogTailer.java:doWork(339)) -
Edit log tailer interrupted 
>     java.lang.InterruptedException: sleep interrupted 
>     at java.lang.Thread.sleep(Native Method) 
>     at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)

>     at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)

>     at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)

>     at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)

>     at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)

>     2015-06-17 23:00:32,930 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1152))
- Starting services required for active state 
>     2015-06-17 23:00:32,946 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnfinalizedSegments(435))
- Starting recovery process for unclosed journal segments... 
>     2015-06-17 23:00:32,963 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398))
- Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM
to [10.222.32.220:8485, 10.222.32.214:8485, 10.222.32.216:8485], stream=null)) 
>     org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to
achieve quorum size 2/3. 3 exceptions thrown: 
>     10.222.32.220:8485: Journal Storage Directory /hadoop/hdfs/journalnode/preprod not
formatted 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message