hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Mon, 12 Sep 2011 13:47:36 GMT
Alright, I think we've discussed enough on this and everybody seems to agree
about a top level hadoop-tools module.

Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
track the rest of the implementation related details and questions for your
specific answers there.

Thanks everyone for putting in your thoughts here.
+Vinod


On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottinghuis@ebay.com>wrote:

> If hadoop-tools will be built as part of hadoop-common, then none of these
> tools should be allowed to have a dependency on hdfs or mapreduce.
> Conversely is also true, when tools do have any such dependency, they
> cannot be bult as part of hadoop-common.
> We cannot have circular dependencies like that.
>
> That is probably obvious, but I'm just saying...
>
> Joep
> ________________________________________
> From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> Sent: Wednesday, September 07, 2011 9:33 PM
> To: mapreduce-dev@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> It is good to have hadoop-tools module separately. But as I asked before we
> need to answer some questions here. I'm trying to answer them myself.
> Comments are welcome.
>
> > > 1.  Should the patches for tools be created against Hadoop Common?
> Here, I meant should Hadoop common mailing list be used Or should we have a
> separate mailing list for Tools? I agree with Vinod  here, that we can tie
> it Hadoop-common jira/mailing lists.
>
> > > 2.  What will happen to the tools test automation? Will it run as part
> of Hadoop Common tests?
> Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> common if use Hadoop common mailing list for this.
> Also, I propose every patch build of HDFS and MAPREDUCE should also run
> tools tests to make sure nothing is broken. That would ease the maintenance
> of hadoop-tools module. I presume tools test should not take much time (some
> thing like not more than 30 minutes).
>
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> I'm not sure about this whether Mavenization can take care of it.
>
> Thanks
> Amareshwari
>
> On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jrottinghuis@ebay.com> wrote:
>
> Does a separate hadoop-tools module imply that there will be a separate
> Jenkins build as well?
>
> Thanks,
>
> Joep
> ________________________________________
> From: Alejandro Abdelnur [tucu@cloudera.com]
> Sent: Wednesday, September 07, 2011 11:35 AM
> To: mapreduce-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> Makes sense
>
> On Wed, Sep 7, 2011 at 11:32 AM, <Milind.Bhandarkar@emc.com> wrote:
>
> > +1 for separate hadoop-tools module. However, if a tool is broken at
> > release time, and no one comes forward to fix it, it should be removed.
> > (i.e. Unlike contrib modules, where build and test failures were
> > tolerated.)
> >
> > - milind
> >
> > On 9/7/11 11:27 AM, "Mahadev Konar" <mahadev@hortonworks.com> wrote:
> >
> > >I like the idea of having tools as a seperate module and I dont think
> > >that it will be a dumping ground unless we choose to make one of it.
> > >
> > >+1 for hadoop tools module under trunk.
> > >
> > >thanks
> > >mahadev
> > >
> > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tucu@cloudera.com>
> > >wrote:
> > >> Agreed, we should not have a dumping ground. IMO, what it would go
> into
> > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > >>FsShell as
> > >> well) are effectively hadoop CLI utilities. Having them in a separate
> > >>module
> > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > >>mean
> > >> that they are secondary things, just modularization. Also it will help
> > >>to
> > >> get those tools to use public interfaces of the core module, and when
> we
> > >> finally have a clean hadoop-client layer, those tools should only
> > >>depend on
> > >> that.
> > >>
> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> > >>does
> > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > >> same/different tools
> > >>
> > >> +1 for hadoop-tools/ (not binding)
> > >>
> > >> Thanks.
> > >>
> > >>
> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric818@gmail.com> wrote:
> > >>
> > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> > >>> coupled.  If we have tools aggregator module, it will not have as
> > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > >>> something broke in tools module, it is unclear which subproject's
> > >>> responsibility to maintain tools function.  Therefore, it is safer
to
> > >>> send tools to incubator or apache extra rather than deposit the
> > >>> utility tools in tools subcategory.  There are many short lived
> > >>> projects that attempts to associate themselves with Hadoop but not
> > >>> being maintained.  It would be better to spin off those utility
> > >>> projects than use Hadoop as a dumping ground.
> > >>>
> > >>> The previous discussion for removing contrib, most people were in
> > >>> favor of doing so, and only a few contrib owners were reluctant to
> > >>> remove contrib.  Fewer people has participated in restore
> > >>> functionality of broken contrib projects.  History speaks for itself.
> > >>> -1 (non-binding) for hadoop-tools.
> > >>>
> > >>> regards,
> > >>> Eric
> > >>>
> > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >>> wrote:
> > >>> > Eric,
> > >>> >
> > >>> > Personally I'm fine either way.
> > >>> >
> > >>> > Still, I fail to see why a generic/categorized tools
> increase/reduce
> > >>>the
> > >>> > risk of dead code and how they make more-difficult/easier the
> > >>> > package&deployment.
> > >>> >
> > >>> > Would you please explain this?
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> > Alejandro
> > >>> >
> > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric818@gmail.com>
> wrote:
> > >>> >
> > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
>  We
> > >>> don't
> > >>> >> want to repeat history for contrib again with hadoop-tools.
>  Having
> > >>>a
> > >>> >> generic module like hadoop-tools increases the risk of accumulate
> > >>>dead
> > >>> code.
> > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > >>>tools
> > >>> in
> > >>> >> their respected subcategories.  It is also easier to manage
from
> > >>> >> package/deployment prospective.
> > >>> >>
> > >>> >> regards,
> > >>> >> Eric
> > >>> >>
> > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > >>> >>
> > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> aw@apache.org>
> > >>> wrote:
> > >>> >> >>
> > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli
wrote:
> > >>> >> >>> We still need to answer Amareshwari's question
(2) she asked
> > >>>some
> > >>> time
> > >>> >> back
> > >>> >> >>> about the automated code compilation and test
execution of the
> > >>>tools
> > >>> >> module.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>>>> My #1 question is if tools is basically
contrib reborn.  If
> > >>>not,
> > >>> what
> > >>> >> >>>> makes
> > >>> >> >>>>> it different?
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>        I'm still waiting for this answer as well.
> > >>> >> >>
> > >>> >> >>        Until such, I would be pretty much against
a tools
> module.
> > >>> >>  Changing the name of the dumping ground doesn't make it any
less
> > >>>of a
> > >>> >> dumping ground.
> > >>> >> >
> > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > >>>maintained
> > >>> >> > then it's not contrib, if it contains all the stuff from
the
> > >>>current
> > >>> >> > MR contrib then tools is just a re-labeling of contrib.
Given
> that
> > >>> >> > this proposal only covers moving distcp to tools it doesn't
> sound
> > >>>like
> > >>> >> > contrib to me.
> > >>> >> >
> > >>> >> > Thanks,
> > >>> >> > Eli
> > >>> >>
> > >>> >>
> > >>> >
> > >>>
> > >>
> > >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message