hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Tue, 30 Aug 2011 12:43:05 GMT
As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tucu@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <evans@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gkesavan@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23
and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not
contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in
the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module
and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message