ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-8185) Services fail to start when pid file is empty
Date Thu, 06 Nov 2014 16:22:33 GMT
Dmitry Lysnichenko created AMBARI-8185:
------------------------------------------

             Summary: Services fail to start when pid file is empty
                 Key: AMBARI-8185
                 URL: https://issues.apache.org/jira/browse/AMBARI-8185
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 1.6.1
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
             Fix For: 2.0.0


Witnessed at a customer site:
* Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
* This file, while present, had no content
* The stack file, {{service.py}} detects a running process using this call:
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}`
>/dev/null 2>&1")
{noformat}
* When the file is empty, this command returns 0 (success), and the startup command does not
run.
* Changed the command to
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat {pid_file}`
>/dev/null 2>&1")
{noformat}
which returns properly that the process is not running and startup can continue.

The customer reports that they have seen this behavior with other services, but could not
reproduce on-site.  This pattern is used frequently through the code base and should be addressed
for all services including Storm.  Validation of this change is the critical task here since
the change is "small" - the effects are large in scope.

Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations of a similar code
with another bug:
{code}
          PID=`cat $PIDFILE`
          echo "Found $AMBARI_AGENT PID: $PID"
          if [ -z "`ps ax -o pid | grep $PID`" ]; then
{code}
Here if $PID is for example 2111 and there is a running process with pid like 22111, we will
get a false positive (agent will refuse to start saying it is already running).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message