ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-11605) Restarting HistoryServer fails during RU because NameNode is in safemode
Date Tue, 02 Jun 2015 03:19:17 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Fernandez updated AMBARI-11605:
-----------------------------------------
    Description: 
When restarting HistoryServer for the first time during the Core Masters rolling upgrade,
the restart fails with the following:

{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] {'security_enabled':
False, 'hadoop_bin_dir': '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user':
'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type':
'directory', 'action': ['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0, '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0, '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe mode.\\nThe
reported blocks 414 needs additional 77 blocks to reach the threshold 0.9900 of total blocks
495.\\nThe number of live datanodes 4 has reached the minimum number 0. Safe mode will be
turned off automatically once the thresholds have been reached."}}403')
{noformat}

Retrying after this error fixes the problem.

Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted,
it's still possible for the standby NameNode to still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before proceeding
to any other services or Service Checks.

  was:
When restarting mapreduce HistoryServer for the first time during the Core Masters rolling
upgrade, the restart fails with the following:

{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] {'security_enabled':
False, 'hadoop_bin_dir': '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user':
'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type':
'directory', 'action': ['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0, '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0, '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe mode.\\nThe
reported blocks 414 needs additional 77 blocks to reach the threshold 0.9900 of total blocks
495.\\nThe number of live datanodes 4 has reached the minimum number 0. Safe mode will be
turned off automatically once the thresholds have been reached."}}403')
{noformat}

Retrying after this error fixes the problem.

Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted,
it's still possible for the standby NameNode to still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before proceeding
to any other services or Service Checks.


> Restarting HistoryServer fails during RU because NameNode is in safemode
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-11605
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11605
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-11605.patch
>
>
> When restarting HistoryServer for the first time during the Core Masters rolling upgrade,
the restart fails with the following:
> {noformat}
> 2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] {'security_enabled':
False, 'hadoop_bin_dir': '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user':
'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type':
'directory', 'action': ['create_on_execute'], 'mode': 0555}
> 2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,862 - checked_call returned (0, '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
> 2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,993 - checked_call returned (0, '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe mode.\\nThe
reported blocks 414 needs additional 77 blocks to reach the threshold 0.9900 of total blocks
495.\\nThe number of live datanodes 4 has reached the minimum number 0. Safe mode will be
turned off automatically once the thresholds have been reached."}}403')
> {noformat}
> Retrying after this error fixes the problem.
> Turns out that now that the HDFS command run faster, by the time the HistorySever is
restarted, it's still possible for the standby NameNode to still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode before proceeding
to any other services or Service Checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message