chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CHUKWA-743) race condition in PidFile
Date Sun, 05 Apr 2015 00:27:33 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396011#comment-14396011
] 

Eric Yang edited comment on CHUKWA-743 at 4/5/15 12:26 AM:
-----------------------------------------------------------

PidFile class should be removed.  Posix file lock interface only work inside the same process
not across multiple instance of the programs.  A old trick was to bind the locking to a port
number as indicator if there is more than one instance of the program has been running.  However,
this approach may not be safe because third party could connect to the binding port to cause
race condition as well.  Hence, hadoop shell script is still the best solution:

{code}
if pid file exists and program running
  warn the user, it's already running
  exit 1
else
  start the program
  record pid
  sleep 1
{code}


was (Author: eyang):
PidFile class should be removed.  Posix file lock interface only work inside the same process
not across multiple instance of the programs.  A old trick was to bind the locking to a port
number as indicator if there is more than one instance of the program has been running.  However,
this approach may not be safe because third party could connect to the binding port to cause
race condition as well.  Hence, hadoop shell script is still the best solution:

{code}
if pid file exists, kill -0 to test program running.
  if program is running
    warn the user, it's already running
    exit 0
  else
    warn the user, it's not running
    exit 1
else
  start the program
  record pid
  sleep 1
{code}

> race condition in PidFile
> -------------------------
>
>                 Key: CHUKWA-743
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-743
>             Project: Chukwa
>          Issue Type: Bug
>            Reporter: Alan Snyder
>
> I believe there is a race condition in org.apache.hadoop.chukwa.util.PidFile. The problem
is that the creation and deletion of the file is not protected by any lock. Client A can delete
the file just before Client B tries to acquire a lock. If at that moment Client C tries to
create the file, it will succeed. Client B and Client C will both succeed in acquiring a lock
because there are two different files (one is hidden because it was deleted after being opened).
I have tested similar code on OS X and this is what happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message