commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Ash (JIRA)" <>
Subject [jira] Created: (DAEMON-183) Abnormal shutdown leaves the pidfile, which prevents subsequent startup
Date Thu, 21 Oct 2010 14:53:16 GMT
Abnormal shutdown leaves the pidfile, which prevents subsequent startup

                 Key: DAEMON-183
             Project: Commons Daemon
          Issue Type: Bug
          Components: Procrun
    Affects Versions: 1.0.3
            Reporter: Steve Ash
            Priority: Trivial

This is really a trivial issue, so you may want to just close as a WONTFIX but it does represent
an inconsistency that I don't feel I can release into production so I'm documenting it here.

When using the pidfile with procrun, if the pidfile isn't deleted then the next startup fails
indicating that a Pid file exists.  Due to incorrectly configuring the service (my stopmode
was not set, so my main thread never returned, causing it to timeout), I often always had
the pidfile existing after the service came down.  This in and of itself seems like it may
be an issue.

None the less on a subsequent startup, it failed indicating that a pidfile existed-- but then
deleted the existing pidfile.  So a second attempt to start would successfully work.  It just
felt a little strange that it would fail the first time, and then work the second time.  I
don't really know if its wrong, but I know that my customers would feel this is fragile/weird.
 Thus, I am just not using the pidfile.

So a few thoughts:

1) should the pidfile check go further and query for a running process with the expected image
(servicename.exe) and process id?  and if it doesn't exist, assume this is an orphaned pidfile
and delete it then continue startup
2) obviously if scm or an external user kills the process then you can't delete the file--
but the timeout that I experienced I think came from SCM not from the timeout in serviceStop
(e.g. I don't think I had a "Worker was killed" message).  So are you aware of a problem with
the timeout logic where the SCM will force the process down instead of waiting for procrun
to timeout? 
3) today if the process aborts startup because the pidfile already exists, the gPidfileName
global has already been set, and thus it deletes the pidfile (i.e. why the second attempt
to start succeeds).  What happens if this pid file represents a real already running process?
 Is the other process locking it-- and the delete would fail?  Or would it successfully delete
the pidfile now allowing multiple concurrent instances to run?

Just a few minor things.  If you feel any of these things are worth implementing/changing,
I would be happy to work on it and submit a patch.  If not, no worries.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message