ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <>
Subject [jira] [Created] (AMBARI-18684) Webhcat server start failed during EU with BindException
Date Mon, 24 Oct 2016 16:49:58 GMT
Jonathan Hurley created AMBARI-18684:

             Summary: Webhcat server start failed during EU with BindException
                 Key: AMBARI-18684
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.2.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.5.0

WebHCat may fail to restart during an upgrade due to the following exception:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/",
line 155, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/",
line 219, in execute
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/",
line 530, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HIVE/",
line 42, in start
    webhcat_service(action='start', upgrade_type=upgrade_type)
  File "/usr/lib/python2.6/site-packages/ambari_commons/", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HIVE/",
line 54, in webhcat_service
    environment = environ)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 154, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 160,
in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 124,
in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/", line
238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 140, in
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 291, in
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'cd /var/run/webhcat ; /usr/hdp/current/hive-webhcat/sbin/
start' returned 1. 

WARN  | 17 Oct 2016 12:53:02,999 | org.eclipse.jetty.util.component.AbstractLifeCycle | FAILED
org.eclipse.jetty.server.Server@19a639d8: Address already in use Address already in use
        at Method)

The problem seems to be caused by the failure of WebHCat to stop before being upgraded. There
was code added in AMBARI-12695 to address the issues with WebHCat not stopping, however, it
doesn't look correct.

- Return Code 0 (prevents the kill -9 from running due to {{not_if}}
! (ls /var/run/webhcat/ >/dev/null 2>&1 && ps -p `/var/lib/ambari-agent/
su hcat -l -s /bin/bash -c 'cat /var/run/webhcat/'` >/dev/null 2>&1)
|| ( sleep 10 && ! (ls /var/run/webhcat/ >/dev/null 2>&1 &&
ps -p ` su hcat -l -s /bin/bash -c 'cat /var/run/webhcat/'` >/dev/null
2>&1) )

- Return Code 0 (prevents Fail from being raised)
! (ls /var/run/webhcat/ >/dev/null 2>&1 && ps -p `/var/lib/ambari-agent/
su hcat -l -s /bin/bash -c 'cat /var/run/webhcat/'` >/dev/null 2>&1)

This message was sent by Atlassian JIRA

View raw message