hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: DistCpV2 in 0.23
Date Fri, 26 Aug 2011 16:47:36 GMT
Please, don't add more Mavenization work on us (eventually I want to go back
to coding)

Given that Hadoop is already Mavenized, the patch should be Mavenized.

What will have to be done extra (besides Mavenizing distcp) is to create a
hadoop-tools module at root level and within it a hadoop-distcp module.

The hadoop-tools POM will look pretty much like the hadoop-common-project
POM.

The hadoop-distcp POM should follow the hadoop-common POM patterns.

Thanks.

Alejandro

On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
> tasks. Since DistCp code is ready to be committed, it need not wait for the
> Tools separation from MR/HDFS.
> I would say it can go into contrib as the patch is now, and when the tools
> restructuring happens it would be just an svn mv.  If there are no issues
> with this proposal I can commit the code tomorrow.
>
> Thanks
> Amareshwari
>
> On 8/26/11 7:45 PM, "Robert Evans" <evans@yahoo-inc.com> wrote:
>
> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> should not block distcpv2 from going in.  It would be very nice, however, to
> get the layout settled soon so that we all know where to find something when
> we want to work on it.
>
> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>
> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
> modules right now, there is still tight coupling between the different
> pieces, especially with tests.  IMO until we can reduce that coupling we
> should treat building and testing Hadoop as a single project instead of
> trying to keep them separate.
>
> --Bobby
>
> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mithun.radhakrishnan@yahoo.com>
> wrote:
>
> Would it be acceptable if retooling of tools/ were taken up separately? It
> sounds to me like this might be a distinct (albeit related) task.
>
> Mithun
>
>
> ________________________________
> From: Giridharan Kesavan <gkesavan@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Sent: Friday, August 26, 2011 12:04 PM
> Subject: Re: DistCpV2 in 0.23
>
> +1 to Alejandro's
>
> I prefer to keep the hadoop-tools at trunk level.
>
> -Giri
>
> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
> wrote:
> > I'd suggest putting hadoop-tools either at trunk/ level or having a a
> tools
> > aggregator module for hdfs and other for common.
> >
> > I personal would prefer at trunk/.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree. It should be separate maven module (and patch puts it as separate
> >> maven module now). And top level for hadoop tools is nice to have, but
> it
> >> becomes hard to maintain until patch automation tests run the tests
> under
> >> tools. Currently we see many times the changes in HDFS effecting RAID
> tests
> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>
> >> I propose we can have something like the following:
> >>
> >> trunk/
> >>  - hadoop-mapreduce
> >>      - hadoop-mr-client
> >>      - hadoop-yarn
> >>      - hadoop-tools
> >>          - hadoop-streaming
> >>          - hadoop-archives
> >>          - hadoop-distcp
> >>
> >> Thoughts?
> >>
> >> @Eli and @JD, we did not replace old legacy distcp because this is
> really a
> >> complete rewrite and did not want to remove it until users are
> familiarized
> >> with new one.
> >>
> >> On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:
> >>
> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>
> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> mahadev@hortonworks.com>
> >> wrote:
> >> > +1 for a seperate module in hadoop-mapreduce-project. I think
> >> > hadoop-mapreduce-client might not be right place for it. We might have
> >> > to pick a new maven module under hadoop-mapreduce-project that could
> >> > host streaming/distcp/hadoop archives.
> >> >
> >> > thanks
> >> > mahadev
> >> >
> >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> tucu@cloudera.com>
> >> wrote:
> >> >> Agree, it should be a separate maven module.
> >> >>
> >> >> And it should be under hadoop-mapreduce-client, right?
> >> >>
> >> >> And now that we are in the topic, the same should go for streaming,
> no?
> >> >>
> >> >> Thanks.
> >> >>
> >> >> Alejandro
> >> >>
> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
> >> wrote:
> >> >>
> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
> >> wrote:
> >> >>> > Nice work!   I definitely think this should go in 23 and 20x.
> >> >>> >
> >> >>> > Agree with JD that it should be in the core code, not contrib.
 If
> >> >>> > it's going to be maintained then we should put it in the core
> code.
> >> >>>
> >> >>> Now that we're all mavenized, though, a separate maven module and
> >> >>> artifact does make sense IMO - ie "hadoop jar
> >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >> >>>
> >> >>> -Todd
> >> >>> --
> >> >>> Todd Lipcon
> >> >>> Software Engineer, Cloudera
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >>
> >
>
>
>
> --
> -Giri
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message