hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sangjin Lee <sjl...@gmail.com>
Subject Re: use of HADOOP_HOME
Date Thu, 28 May 2015 18:29:41 GMT
Thanks Chris and Allen for the info! Yes, we can use HADOOP_PREFIX
until/unless HADOOP-11393 is resolved.

Just to clarify, we're not setting HADOOP_HOME/HADOOP_PREFIX in our
*-env.sh; we simply use them. I don't know that it is always feasible to
set them at the machine level. Some setups may have multiple hadoop
installs and want to switch between them, and so on.

On Thu, May 28, 2015 at 10:13 AM, Allen Wittenauer <aw@altiscale.com> wrote:

> On May 28, 2015, at 9:36 AM, Sangjin Lee <sjlee@apache.org> wrote:
> > Hi folks,
> >
> > I noticed this while setting up a cluster based on the current trunk. It
> > appears that setting HADOOP_HOME is now done much later (in
> > hadoop_finalize) than branch-2. Importantly this is set *after*
> > hadoop-env.sh (or yarn-env.sh) is invoked.
> >
> > In our version of hadoop-env.sh, we have used $HADOOP_HOME to define some
> > more variables, but it appears that we can no longer rely on the
> > HADOOP_HOME value in our *-env.sh customization. Is this an intended
> change
> > in the recent shell script refactoring? What is the right thing to use in
> > hadoop-env.sh for the location of hadoop?
>         a) HADOOP_HOME was deprecated on Unix systems as part of (IIRC)
> 0.21.  HADOOP_PREFIX was its replacement.  (No, I never understood the
> reasoning for this either.)  Past 0.21, it was never safe to rely upon
> HADOOP_HOME in *-env.sh files unless it is set prior to running the shell
> commands.
>         b) That said, functionality-wise, HADOP_HOME is being set in
> pretty much the same place in the code flow.  *-env.sh has already been
> processed in both branch-2 and trunk by the time HADOOP_HOME is
> configured.  trunk only configures HADOOP_HOME for backward compatibility.
> The rest of the code uses HADOOP_PREFIX as expected and very very early on
> the lifecycle.
>         What you are likely seeing is the result of a bug fix:  trunk
> doesn't reprocess *-env.sh files when using the shin commands whereas
> branch-2 does it several times over. (This is also one of the reasons why
> Java command line options are duplicated too.)  So it likely worked for you
> because of this broken behavior.
>         In my mind, it is a better practice to configure
> HADOOP_HOME/HADOOP_PREFIX outside of the *-env.sh files (e.g.,
> /etc/profile.d on Linux) so that one can use them for PATH, etc.  That
> should guarantee expected behavior.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message