hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Jain (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-7822) Hadoop startup script has a race condition : this causes failures in datanodes status and stop commands
Date Tue, 15 Nov 2011 00:08:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rahul Jain updated HADOOP-7822:

    Affects Version/s: 0.20.1

Our fix to get around the issues has been to add exclusion filter to 'rsync' under hadoop-daemon.sh

  rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*'

Updated with fix:
 rsync -a -e ssh --delete --exclude='pids/*.pid' --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*'

> Hadoop startup script has a race condition : this causes failures in datanodes status
and stop commands
> -------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-7822
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7822
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.20.2,
>            Reporter: Rahul Jain
> The symptoms are the following:
> a) start-all.sh is able to start both hadoop dfs and map-reduce processes, assuming same
grid nodes are used for dfs and map-reduce
> b) stop-all.sh stops map-reduce but fails to stop dfs processes (datanode tasks on grid
>     Instead, the warning message 'no datanode to stop' is seen for all data nodes.
> c) The 'pid' files for datanode processes do not exist therefore the only way to stop
datanode processes is to manually execute kill commands.
> The root cause of the issue appears to be in hadoop startup scripts. start-all.sh is
really two parts:
> 1. start-dfs.sh : Start namenode and datanodes
> 2. start-mapred.sh: Jobtracker and task trackers.
> In this case, running start-dfs.sh did as expected and created the pid files for different
datanodes. However, start-mapred.sh script did end up forcing another rsync from master to
slaves, effectively wiping out the pid files stored under "pid" directory.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message