Return-Path: X-Original-To: apmail-chukwa-dev-archive@www.apache.org Delivered-To: apmail-chukwa-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5698117E1A for ; Sun, 5 Apr 2015 00:22:33 +0000 (UTC) Received: (qmail 85199 invoked by uid 500); 5 Apr 2015 00:22:33 -0000 Delivered-To: apmail-chukwa-dev-archive@chukwa.apache.org Received: (qmail 85170 invoked by uid 500); 5 Apr 2015 00:22:33 -0000 Mailing-List: contact dev-help@chukwa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@chukwa.apache.org Delivered-To: mailing list dev@chukwa.apache.org Received: (qmail 85157 invoked by uid 99); 5 Apr 2015 00:22:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Apr 2015 00:22:33 +0000 Date: Sun, 5 Apr 2015 00:22:33 +0000 (UTC) From: "Eric Yang (JIRA)" To: dev@chukwa.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CHUKWA-743) race condition in PidFile MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CHUKWA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396011#comment-14396011 ] Eric Yang edited comment on CHUKWA-743 at 4/5/15 12:22 AM: ----------------------------------------------------------- PidFile class should be removed. Posix file lock interface only work inside the same process not across multiple instance of the programs. A old trick was to bind the locking to a port number as indicator if there is more than one instance of the program has been running. However, this approach may not be safe because third party could connect to the binding port to cause race condition as well. Hence, hadoop shell script is still the best solution: {code} if pid file exists, kill -0 to test program running. if program is running warn the user, it's already running exit 0 else warn the user, it's not running exit 1 else start the program record pid sleep 1 {code} was (Author: eyang): PidFile class should be removed. Posix file lock interface only work inside the same process not across multiple instance of the programs. A old trick was to bind the locking to a port number as indicator if there is more than one instance of the program has been running. However, this approach may not be safe because third party could connect to the binding port to cause race condition as well. Hence, hadoop shell script is still the best solution: {code} if pid file exists, kill -0 to test program running. if program is running exit 1 else start the program record pid sleep 1 {code} > race condition in PidFile > ------------------------- > > Key: CHUKWA-743 > URL: https://issues.apache.org/jira/browse/CHUKWA-743 > Project: Chukwa > Issue Type: Bug > Reporter: Alan Snyder > > I believe there is a race condition in org.apache.hadoop.chukwa.util.PidFile. The problem is that the creation and deletion of the file is not protected by any lock. Client A can delete the file just before Client B tries to acquire a lock. If at that moment Client C tries to create the file, it will succeed. Client B and Client C will both succeed in acquiring a lock because there are two different files (one is hidden because it was deleted after being opened). I have tested similar code on OS X and this is what happened. -- This message was sent by Atlassian JIRA (v6.3.4#6332)