ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-12013) Datanode failed to restart during RU because the shutdownDatanode -upgrade command can fail sometimes
Date Fri, 19 Jun 2015 00:33:00 GMT
Alejandro Fernandez created AMBARI-12013:
--------------------------------------------

             Summary: Datanode failed to restart during RU because the shutdownDatanode -upgrade
command can fail sometimes
                 Key: AMBARI-12013
                 URL: https://issues.apache.org/jira/browse/AMBARI-12013
             Project: Ambari
          Issue Type: Bug
          Components: ari-server, ambari-server
    Affects Versions: 2.1.0
            Reporter: Alejandro Fernandez
            Assignee: Alejandro Fernandez
            Priority: Critical
             Fix For: 2.1.0


Deploy Test with RU from HDP 2.2.0.0-2041 to HDP-2.3.0.0-2398

Failed on: Restarting DataNode on ip-172-31-44-83.ec2.internalshow details

{code}
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
line 151, in <module>
    DataNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 216, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 437, in restart
    self.stop(env, rolling_restart=rolling_restart)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
line 55, in stop
    datanode_upgrade.pre_upgrade_shutdown()
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py",
line 43, in pre_upgrade_shutdown
    Execute(command, user=params.hdfs_user, tries=1 )
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152,
in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118,
in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line
254, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in
_call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in
_call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'hdfs dfsadmin -shutdownDatanode 0.0.0.0:8010
upgrade' returned 255. shutdownDatanode: Shutdown already in progress.
{code}

There's a known issue in HDP 2.2.0.0 (HDFS-7533) where shutting down the datanode will not
work because not all writers have responder running, but sendOOB() tries anyway.
If the shutdown command fails with an output of "Shutdown already in progress", then Ambari
should call datanode(action="stop"), which under the hood calls "hadoop-daemon.sh stop datanode"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message