Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27D259A5D for ; Tue, 15 Nov 2011 00:05:15 +0000 (UTC) Received: (qmail 75349 invoked by uid 500); 15 Nov 2011 00:05:13 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 75277 invoked by uid 500); 15 Nov 2011 00:05:13 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 75268 invoked by uid 99); 15 Nov 2011 00:05:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 00:05:13 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 00:05:11 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BB4ED83B79 for ; Tue, 15 Nov 2011 00:04:51 +0000 (UTC) Date: Tue, 15 Nov 2011 00:04:51 +0000 (UTC) From: "Rahul Jain (Created) (JIRA)" To: common-dev@hadoop.apache.org Message-ID: <1681602691.28883.1321315491768.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HADOOP-7822) Hadoop startup script has a race condition : this causes failures in datanodes status and stop commands MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Hadoop startup script has a race condition : this causes failures in datanodes status and stop commands ------------------------------------------------------------------------------------------------------- Key: HADOOP-7822 URL: https://issues.apache.org/jira/browse/HADOOP-7822 Project: Hadoop Common Issue Type: Bug Reporter: Rahul Jain The symptoms are the following: a) start-all.sh is able to start both hadoop dfs and map-reduce processes, assuming same grid nodes are used for dfs and map-reduce b) stop-all.sh stops map-reduce but fails to stop dfs processes (datanode tasks on grid nodes) Instead, the warning message 'no datanode to stop' is seen for all data nodes. c) The 'pid' files for datanode processes do not exist therefore the only way to stop datanode processes is to manually execute kill commands. The root cause of the issue appears to be in hadoop startup scripts. start-all.sh is really two parts: 1. start-dfs.sh : Start namenode and datanodes 2. start-mapred.sh: Jobtracker and task trackers. In this case, running start-dfs.sh did as expected and created the pid files for different datanodes. However, start-mapred.sh script did end up forcing another rsync from master to slaves, effectively wiping out the pid files stored under "pid" directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira