ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-11605) Restarting HistoryServer fails during RU because NameNode is in safemode
Date Tue, 02 Jun 2015 03:18:17 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Fernandez updated AMBARI-11605:
-----------------------------------------
    Attachment: AMBARI-11605.patch

> Restarting HistoryServer fails during RU because NameNode is in safemode
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-11605
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11605
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-11605.patch
>
>
> When restarting mapreduce HistoryServer for the first time during the Core Masters rolling
upgrade, the restart fails with the following:
> {noformat}
> 2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] {'security_enabled':
False, 'hadoop_bin_dir': '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user':
'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type':
'directory', 'action': ['create_on_execute'], 'mode': 0555}
> 2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,862 - checked_call returned (0, '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
> 2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,993 - checked_call returned (0, '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe mode.\\nThe
reported blocks 414 needs additional 77 blocks to reach the threshold 0.9900 of total blocks
495.\\nThe number of live datanodes 4 has reached the minimum number 0. Safe mode will be
turned off automatically once the thresholds have been reached."}}403')
> {noformat}
> Retrying after this error fixes the problem.
> Turns out that now that the HDFS command run faster, by the time the HistorySever is
restarted, it's still possible for the standby NameNode to still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode before proceeding
to any other services or Service Checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message