hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8009) Create hadoop-client and hadoop-test artifacts for downstream projects
Date Wed, 01 Feb 2012 15:01:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197885#comment-13197885
] 

Robert Joseph Evans commented on HADOOP-8009:
---------------------------------------------

I like the patch, but I am a bit concerned about the scope of hadoop-client.  There appears
to be an effort underway to add in other computing models on top of yarn that are being added
in as part of Hadoop itself, MPI with Hampster is the one the seems to be the furthest along.
 Would we add these in as well?  If not then I would prefer to see hadoop-client named something
with mapreduce in it, because that is what this is really creating a mapreduce client package
for PIG, Hive and other mapreduce users.

Other projects that just want to use YARN to write their own application master would not
really want to use it, because they would be pulling in all of the mapreduce client as well.
 Also what about a project that just wants to use HDFS.  Do they want to pull in all of yarn
and mapreduce?  Perhaps providing consumers of other parts of hadoop with similar functionality
is beyond the scope of this ticket, and if so that is fine.  I just want to understand a little
better what the intention is for this JIRA before I give it a +1.
                
> Create hadoop-client and hadoop-test artifacts for downstream projects 
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-8009
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8009
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.23.1, 1.0.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>         Attachments: HADOOP-8009.patch
>
>
> Using Hadoop from projects like Pig/Hive/Sqoop/Flume/Oozie or any in-house system that
interacts with Hadoop is quite challenging for the following reasons:
> * *Different versions of Hadoop produce different artifacts:* Before Hadoop 0.23 there
was a single artifact hadoop-core, starting with Hadoop 0.23 there are several (common, hdfs,
mapred*, yarn*)
> * *There are no 'client' artifacts:* Current artifacts include all JARs needed to run
the services, thus bringing into clients several JARs that are not used for job submission/monitoring
(servlet, jsp, tomcat, jersey, etc.)
> * *Doing testing on the client side is also quite challenging as more artifacts have
to be included than the dependencies define:* for example, the history-server artifact has
to be explicitly included. If using Hadoop 1 artifacts, jersey-server has to be explicitly
included.
> * *3rd party dependencies change in Hadoop from version to version:* This makes things
complicated for projects that have to deal with multiple versions of Hadoop as their exclusions
list become a huge mix & match of artifacts from different Hadoop versions and it may
be break things when a particular version of Hadoop requires a dependency that other version
of Hadoop does not require.
> Because of this it would be quite convenient to have the following 'aggregator' artifacts:
> * *org.apache.hadoop:hadoop-client* : it includes all required JARs to use Hadoop client
APIs (excluding all JARs that are not needed for it)
> * *org.apache.hadoop:hadoop-test* : it includes all required JARs to run Hadoop Mini
Clusters
> These aggregator artifacts would be created for current branches under development (trunk,
0.22, 0.23, 1.0) and for released versions that are still in use.
> For branches under development, these artifacts would be generated as part of the build.
> For released versions we would have a a special branch used only as vehicle for publishing
the corresponding 'aggregator' artifacts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message