ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <>
Subject [jira] [Created] (AMBARI-12013) Datanode failed to restart during RU because the shutdownDatanode -upgrade command can fail sometimes
Date Fri, 19 Jun 2015 00:33:00 GMT
Alejandro Fernandez created AMBARI-12013:

             Summary: Datanode failed to restart during RU because the shutdownDatanode -upgrade
command can fail sometimes
                 Key: AMBARI-12013
             Project: Ambari
          Issue Type: Bug
          Components: ari-server, ambari-server
    Affects Versions: 2.1.0
            Reporter: Alejandro Fernandez
            Assignee: Alejandro Fernandez
            Priority: Critical
             Fix For: 2.1.0

Deploy Test with RU from HDP to HDP-

Failed on: Restarting DataNode on ip-172-31-44-83.ec2.internalshow details

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/",
line 151, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/",
line 216, in execute
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/",
line 437, in restart
    self.stop(env, rolling_restart=rolling_restart)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/",
line 55, in stop
  File "/var/lib/ambari-agent/cache/common-services/HDFS/",
line 43, in pre_upgrade_shutdown
    Execute(command, user=params.hdfs_user, tries=1 )
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 157, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 152,
in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 118,
in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/", line
254, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 140, in
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 291, in
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'hdfs dfsadmin -shutdownDatanode
upgrade' returned 255. shutdownDatanode: Shutdown already in progress.

There's a known issue in HDP (HDFS-7533) where shutting down the datanode will not
work because not all writers have responder running, but sendOOB() tries anyway.
If the shutdown command fails with an output of "Shutdown already in progress", then Ambari
should call datanode(action="stop"), which under the hood calls " stop datanode"

This message was sent by Atlassian JIRA

View raw message