hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rottinghuis, Joep" <jrottingh...@ebay.com>
Subject RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Thu, 08 Sep 2011 03:43:33 GMT
Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Milind.Bhandarkar@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <mahadev@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric818@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tucu@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric818@gmail.com>
wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
 We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2)
she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution
of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib
reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools
module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
that
> >>> >> > this proposal only covers moving distcp to tools it doesn't
sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>

Mime
View raw message