hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12857) Rework hadoop-tools-dist
Date Wed, 02 Mar 2016 18:42:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176220#comment-15176220
] 

Allen Wittenauer commented on HADOOP-12857:
-------------------------------------------

I have some sample code working.  It was very enlightening and I know what to do now.  If
we really do want to keep one directory, here's my current plan of attack:

* Truly optional components (s3, azure, openstack, kafka, etc), will have a shellprofile built
that users can enable by doing the necessary incantations.  I'm currently thinking I might
be able to add content to hadoop-env.sh at build time to actually turn these things on via
a single env-var setting or one per feature. No promises.  (Yes, I'm currently looking for
my "Black Hat of Bash Wizardry" to make this happen.) Worst case, it'll be a "copy and rename
to HADOOP_CONF_DIR".

* With some help from [~raviprak] to make me see the forest for the trees, I can now build
shell parse-able dependency lists at build time.  I have two ways I can process this:  I can
either store these lists in the hadoop-dist target directory or in the target directory of
the actually tools+using a well-known-name+find to build the necessary shell magic at build
time.  I'm leaning towards the latter since that will allow mvn clean to work in hadoop-dist
in an expected way, since there won't be a hidden dependency on hadoop-tools having been run
before the mvn package.

* distch, distcp, archive-logs, etc, are extremely problematic. Using shell profiles for these
WILL NOT WORK since they a) aren't really optional and b) removing them from the command line
tools won't really help anyone.  Currently these commands load all of HADOOP_TOOLS_PATH which
is awful. I want to add to libexec/ a tools directory that stores helper functions for tools
jars that are required for the various subcommands.  It will use similar but different code
from the optional components.  It will key off a different filename for the dependency list
and there will need to be a contract between the helper function names and the dependency
file name.  (This sounds worse than what it is.) 

I *wish* there was a way to dynamically add subcommands to hadoop, mapred, etc, but the code
just isn't quite there yet.  We can do usage now, but not actually execution.

One big question: How should this work proceed?
# Single patch
# Multiple patches with a strict commit dependency order
# Separate branch followed by a branch merge

Given this work will likely be all or nothing I'm not a fan of multiple patches.

> Rework hadoop-tools-dist
> ------------------------
>
>                 Key: HADOOP-12857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>
> As hadoop-tools grows bigger and bigger, it's becoming evident that having a single directory
that gets sucked in is starting to become a big burden as the number of tools grows.  Let's
rework this to be smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message