hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23
Date Sat, 31 Dec 2011 21:37:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178081#comment-13178081
] 

Steve Loughran commented on HADOOP-7939:
----------------------------------------

I'm going to start by saying I couldn't get the tarball to start up. Here are some of the
problems I hit:

 * HADOOP-7838  sbin/start-balancer doesnt
 * MAPREDUCE-3430 - Shell variable expansions in yarn shell scripts should be quoted
 * MAPREDUCE-3431 - NPE in Resource Manager shutdown
 * MAPREDUCE-3432 - Yarn doesn't work if JAVA_HOME isn't set

The key problem was the #of various env variables to set, something wring with env propagation
(MAPREDUCE-3432 shows this), no "how to get up an running in 5 minutes" documents and the
fact that some shell scripts contain assumptions about code layout that aren't valid; HADOOP-7838
show this.

There's probably an underlying problem: no testing that the tarball works when deployed onto
a clean OS into a directory with a space in it somewhere up the tree. This isn't that hard
to write; a few ant tasks to <scp> the file then <ssh> some commands -and without
it you can't be sure such problems have gone away and won't come back.

If I have that problem, I expect end users will, and fear for the traffic on hadoop-*-users.
That's not just pain and suffering, it will cause people to not use Hadoop. As you don't pay
for a free download, you haven't put enough money on the table to spend a day getting the
thing up and running on your desktop. Any bad installation experience will put people off.

Tom white's goal "one single env variable" is what I'd like. Set that, have the others drive
off it (unless over-ridden) -and work it out based on bin/something if it isn't predefined.


Looking at this proposal, 

 # I like the idea of a standard layout that can be tuned, so that we have the option to point
to different versions of things if need be, but you don't need to set up everything in advance.
 # You can't rely on symlinks in windows-land, which, given the recent MS support for Hadoo
on Azure, may matter in production as well as dev. And remember, those Windows desktop installs
probably form the majority of single-user deployments.
# Windows also has the hard limit of 1024 chars on command lines; is the thing that tops out
first on long classpaths (forcing you to set the CLASSPATH env variable then call java, but
even that has limits).
# We need some tests. I know BigTop does this, but would like some pushed up earlier into
the process, so all HADOOP- HDFS- and MAPREDUCE- patches get regression tested against the
scripts in their initial tests.
# Todd's points about config, tmp &c raise another point. per-user options and temp dirs
should be in different paths from the binaries. I don't want the temp files on the root disk,
and just because Hadoop was installed by root doesn't mean I shouldn't be able to run Hadoop
with my own config.
# Redirectable config/tmp also makes it trivial to play with different installation options
without editing conf files.

In an ideal world we'd also replace the bash scripts with python as it's a more readable/editable
language, less quirky and sets things up for more python-round-the-edges work. I don't know
enough about python on windows to know the consequences of that; I'd expect python to be native
(not cygwin). I'll put that to one side for now.

For me, then
 * A root hadoop dir that has things out underneath is good.
 * I would like a way to point to my config/tmp dirs without needing to edit symlinks.
 * This stuff needs to work on windows too.
 * The tarball needs installation tests.





                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to be installed

>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's implementation

>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended functionality of the
subcomponent (nonessential for correct work of the basic functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client side of
the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the server/system side
of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to the implementation
and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message