From "Roman Shaposhnik (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23
Date Wed, 28 Dec 2011 00:32:31 GMT

Roman Shaposhnik commented on HADOOP-7939:


No disagreement, expect for the fact that the current implementation of passing on the 'standard'
deps for applications is suboptimal since it can not be fully parameterized. That's all I'm
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>    ## root of the subtree in a filesystem where a subcomponent is expected to be installed

>    ## default value: $0/..
>    ## a subdirectory with all of the jar files comprising subcomponent's implementation

>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>    ## a subdirectory with all of the jar files needed for extended functionality of the
subcomponent (nonessential for correct work of the basic functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>    ## a subdirectory with all of the launcher scripts specific to the client side of
the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>    ## a subdirectory with all of the launcher scripts specific to the server/system side
of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>    ## a subdirectory with all of the launcher scripts that are internal to the implementation
and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>    ## a subtree in the local filesystem for storing component's persistent state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

