hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
Date Tue, 06 Sep 2011 16:30:49 GMT
On Tue, Sep 6, 2011 at 10:58 AM, Mithun Radhakrishnan <
mithun.radhakrishnan@yahoo.com> wrote:

> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm
> hoping that's going to be acceptable to this forum. This way, moving it out
> to a separate source tree should be easier.
>


+1 for moving forward with this proposal.

We still need to answer Amareshwari's question (2) she asked some time back
about the automated code compilation and test execution of the tools module.
Right now we have separate automated builds for common, hdfs and mapreduce.
If we go with the above proposal, we need to setup automated builds for the
tools modules and possibly tie the related JIRA/Jenkins emails with the
common-project lists.



> It would be nice to have clarity on how tools will be dealt with. It'd be
> convenient to distcp in trunk. (It's tiny and useful.) On the other hand,
> that might be opening doors to adding too much, and complicating the
> build/release. I'd appreciate advice on which way is best.
>
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized
> version of things, as per Tucu's suggestions.
>
>
+1


Thanks,
+Vinod



> ________________________________
> From: Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>; Mithun
> Radhakrishnan <mithun.radhakrishnan@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
>
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>     -- Sources at tools/trunk/hadoop-distcp
>     -- Each tool will work with specific version of Hadoop core.
>     -- Releases can really be separate
> (2) Same source tree: trunk/
>     -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>     -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
>
> +Vinod
>
>
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
> > Copying common-dev.
> >
> > Summarizing the below discussion: What should be the tools layout after
> > mavenization?
> >
> > Option #1: Have hadoop-tools at top level i.e
> > trunk/
> >   hadoop-tools/
> >       hadoop-distcp/
> > Pros:
> >  Cleaner layout.
> >  In future, tools could be released separately from  Hadoop releases
> >
> > Cons: Difficult to maintain
> >
> > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> > they are depending on MapReduce/HDFS/Common respectively.
> > For ex:
> > hadoop-mapreduce-project/
> >   hadoop-mr-tools/
> >      hadoop-distcp/
> >
> > Pros: Easy to maintain
> > Cons: Still has tight coupling with related projects.
> >
> > Personally, I'm fine with any of the above options. Looking for
> suggestions
> > and reaching a consensus on this.
> >
> > Thanks
> > Amareshwari
> >
> > On 8/30/11 12:10 AM, "Allen Wittenauer" <aw@apache.org> wrote:
> >
> >
> >
> > I have a feeling this discussion should get moved to common-dev or even
> to
> > general.
> >
> > My #1 question is if tools is basically contrib reborn.  If not, what
> makes
> > it different?
> >
> > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
> >
> > > Some questions on making hadoop-tools top level under trunk,
> > >
> > > 1.  Should the patches for tools be created against Hadoop Common?
> > > 2.  What will happen to the tools test automation? Will it run as part
> of
> > Hadoop Common tests?
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> > >
> > >
> > > Thanks
> > > Amareshwari
> > >
> > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tucu@cloudera.com> wrote:
> > >
> > > Please, don't add more Mavenization work on us (eventually I want to go
> > back
> > > to coding)
> > >
> > > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> > >
> > > What will have to be done extra (besides Mavenizing distcp) is to
> create
> > a
> > > hadoop-tools module at root level and within it a hadoop-distcp module.
> > >
> > > The hadoop-tools POM will look pretty much like the
> hadoop-common-project
> > > POM.
> > >
> > > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> > >
> > > Thanks.
> > >
> > > Alejandro
> > >
> > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > > amarsri@yahoo-inc.com> wrote:
> > >
> > >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> > separate
> > >> tasks. Since DistCp code is ready to be committed, it need not wait
> for
> > the
> > >> Tools separation from MR/HDFS.
> > >> I would say it can go into contrib as the patch is now, and when the
> > tools
> > >> restructuring happens it would be just an svn mv.  If there are no
> > issues
> > >> with this proposal I can commit the code tomorrow.
> > >>
> > >> Thanks
> > >> Amareshwari
> > >>
> > >> On 8/26/11 7:45 PM, "Robert Evans" <evans@yahoo-inc.com> wrote:
> > >>
> > >> I agree with Mithun.  They are related but this goes beyond distcpv2
> and
> > >> should not block distcpv2 from going in.  It would be very nice,
> > however, to
> > >> get the layout settled soon so that we all know where to find
> something
> > when
> > >> we want to work on it.
> > >>
> > >> Also +1 for Alejandro's I also prefer to keep tools at the trunk
> level.
> > >>
> > >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> > separate
> > >> modules right now, there is still tight coupling between the different
> > >> pieces, especially with tests.  IMO until we can reduce that coupling
> we
> > >> should treat building and testing Hadoop as a single project instead
> of
> > >> trying to keep them separate.
> > >>
> > >> --Bobby
> > >>
> > >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> > mithun.radhakrishnan@yahoo.com>
> > >> wrote:
> > >>
> > >> Would it be acceptable if retooling of tools/ were taken up
> separately?
> > It
> > >> sounds to me like this might be a distinct (albeit related) task.
> > >>
> > >> Mithun
> > >>
> > >>
> > >> ________________________________
> > >> From: Giridharan Kesavan <gkesavan@hortonworks.com>
> > >> To: mapreduce-dev@hadoop.apache.org
> > >> Sent: Friday, August 26, 2011 12:04 PM
> > >> Subject: Re: DistCpV2 in 0.23
> > >>
> > >> +1 to Alejandro's
> > >>
> > >> I prefer to keep the hadoop-tools at trunk level.
> > >>
> > >> -Giri
> > >>
> > >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >> wrote:
> > >>> I'd suggest putting hadoop-tools either at trunk/ level or having a
a
> > >> tools
> > >>> aggregator module for hdfs and other for common.
> > >>>
> > >>> I personal would prefer at trunk/.
> > >>>
> > >>> Thanks.
> > >>>
> > >>> Alejandro
> > >>>
> > >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> > >>> amarsri@yahoo-inc.com> wrote:
> > >>>
> > >>>> Agree. It should be separate maven module (and patch puts it as
> > separate
> > >>>> maven module now). And top level for hadoop tools is nice to have,
> but
> > >> it
> > >>>> becomes hard to maintain until patch automation tests run the tests
> > >> under
> > >>>> tools. Currently we see many times the changes in HDFS effecting
> RAID
> > >> tests
> > >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> > >>>>
> > >>>> I propose we can have something like the following:
> > >>>>
> > >>>> trunk/
> > >>>> - hadoop-mapreduce
> > >>>>     - hadoop-mr-client
> > >>>>     - hadoop-yarn
> > >>>>     - hadoop-tools
> > >>>>         - hadoop-streaming
> > >>>>         - hadoop-archives
> > >>>>         - hadoop-distcp
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> @Eli and @JD, we did not replace old legacy distcp because this
is
> > >> really a
> > >>>> complete rewrite and did not want to remove it until users are
> > >> familiarized
> > >>>> with new one.
> > >>>>
> > >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:
> > >>>>
> > >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could
go
> > >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> > >>>>
> > >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> > >> mahadev@hortonworks.com>
> > >>>> wrote:
> > >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> > >>>>> hadoop-mapreduce-client might not be right place for it. We
might
> > have
> > >>>>> to pick a new maven module under hadoop-mapreduce-project that
> could
> > >>>>> host streaming/distcp/hadoop archives.
> > >>>>>
> > >>>>> thanks
> > >>>>> mahadev
> > >>>>>
> > >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> > >> tucu@cloudera.com>
> > >>>> wrote:
> > >>>>>> Agree, it should be a separate maven module.
> > >>>>>>
> > >>>>>> And it should be under hadoop-mapreduce-client, right?
> > >>>>>>
> > >>>>>> And now that we are in the topic, the same should go for
> streaming,
> > >> no?
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>>
> > >>>>>> Alejandro
> > >>>>>>
> > >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
> > >>>> wrote:
> > >>>>>>>> Nice work!   I definitely think this should go
in 23 and 20x.
> > >>>>>>>>
> > >>>>>>>> Agree with JD that it should be in the core code,
not contrib.
> If
> > >>>>>>>> it's going to be maintained then we should put
it in the core
> > >> code.
> > >>>>>>>
> > >>>>>>> Now that we're all mavenized, though, a separate maven
module and
> > >>>>>>> artifact does make sense IMO - ie "hadoop jar
> > >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop
distcp"
> > >>>>>>>
> > >>>>>>> -Todd
> > >>>>>>> --
> > >>>>>>> Todd Lipcon
> > >>>>>>> Software Engineer, Cloudera
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Todd Lipcon
> > >>>> Software Engineer, Cloudera
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> -Giri
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message