hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2045) HADOOP_*_HOME environment variables no longer work for tar ball distributions
Date Tue, 07 Jun 2011 22:54:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045694#comment-13045694
] 

Eric Yang commented on HDFS-2045:
---------------------------------

HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME were results of splitting the source
code into three different submodules.  While this works fine for developer to isolate each
project, it makes configuration difficult for production use. HDFS and MAPRED run as their
own uid.  The amount of configuration just multiples.

To solve this problem, there are a couple options:

Option 1.  Modify jar file which contains all common shell script in common jar file, when
binary tarball is built, the common shell scripts are rearranged submerged into the binary
tarball distribution, and completely remove HADOOP_*_HOME environment variables.  $HADOOP_PREFIX
is the only hint (generated from shell script path, no need to define in the environment)
to all hadoop programs where the bits are exactly layout.  When HDFS or MAPREDUCE is deployed,
there is no need to deploy COMMON tarball.  To make this work for developers, *-config.sh
should be moved to $HADOOP_PREFIX/libexec.  During the build process, hadoop-common-*.jar
is extract for common shell scripts.  Both developer and binary layout are closer to each
other.  (When project is converted to maven, this keeps hdfs/mapreduce loosely coupled and
reduce duplicated shell scripts.)

Option 2. Preserve HADOOP_*_HOME for source code execution.  Environment driven layout does
not work on binary tarball. Change the prefix tarball from hadoop-[common|mapred|hdfs]-0.23.0-SNAPSHOT
to hadoop-[version] for easy extraction.

Option 3.  Enable HADOOP_*_HOME for binary tarball.  (Risk of crashing the system due to bad
environment variable setup)

Option 4.  Merge hdfs/mapreduce back to the same project, but create as subdirectories to
reduce duplicated shell scripts.

I am incline to vote for option 2.

> HADOOP_*_HOME environment variables no longer work for tar ball distributions
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-2045
>                 URL: https://issues.apache.org/jira/browse/HDFS-2045
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Aaron T. Myers
>
> It used to be that you could do the following:
> # Run `ant bin-package' in your hadoop-common checkout.
> # Set HADOOP_COMMON_HOME to the built directory of hadoop-common.
> # Run `ant bin-package' in your hadoop-hdfs checkout.
> # Set HADOOP_HDFS_HOME to the built directory of hadoop-hdfs.
> # Set PATH to have HADOOP_HDFS_HOME/bin and HADOOP_COMMON_HOME/bin on it.
> # Run `hdfs'.
> \\
> \\
> As of HDFS-1963, this no longer works since hdfs-config.sh is looking in HADOOP_COMMON_HOME/bin/
for hadoop-config.sh, but it's being placed in HADOOP_COMMON_HOME/libexec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message