hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mithun Radhakrishnan <mithun.radhakrish...@yahoo.com>
Subject Re: DistCpV2 in 0.23
Date Fri, 26 Aug 2011 17:17:37 GMT
Greetings, Tucu. I'd like very much to take you up on that.

DistCpV2's build is currently mavenized. (Apologies. I neglected to mention that in this mail-thread.)
Could I please bother you to review the pom? As the patch stands now, DistCpV2 needs building
separately.

Grazie,
Mithun


________________________________
From: Alejandro Abdelnur <tucu@cloudera.com>
To: mapreduce-dev@hadoop.apache.org
Sent: Friday, August 26, 2011 10:18 PM
Subject: Re: DistCpV2 in 0.23

And I'll be more than happy to review it from the Mavenization perspective.

Thxs.

Alejandro

On Fri, Aug 26, 2011 at 9:47 AM, Alejandro Abdelnur <tucu@cloudera.com>wrote:

> Please, don't add more Mavenization work on us (eventually I want to go
> back to coding)
>
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
>
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
>
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>
> Thanks.
>
> Alejandro
>
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>>
>> Thanks
>> Amareshwari
>>
>> On 8/26/11 7:45 PM, "Robert Evans" <evans@yahoo-inc.com> wrote:
>>
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>>
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
>> separate modules right now, there is still tight coupling between the
>> different pieces, especially with tests.  IMO until we can reduce that
>> coupling we should treat building and testing Hadoop as a single project
>> instead of trying to keep them separate.
>>
>> --Bobby
>>
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
>> mithun.radhakrishnan@yahoo.com> wrote:
>>
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>>
>> Mithun
>>
>>
>> ________________________________
>> From: Giridharan Kesavan <gkesavan@hortonworks.com>
>> To: mapreduce-dev@hadoop.apache.org
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>>
>> +1 to Alejandro's
>>
>> I prefer to keep the hadoop-tools at trunk level.
>>
>> -Giri
>>
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com>
>> wrote:
>> > I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>> > aggregator module for hdfs and other for common.
>> >
>> > I personal would prefer at trunk/.
>> >
>> > Thanks.
>> >
>> > Alejandro
>> >
>> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>> > amarsri@yahoo-inc.com> wrote:
>> >
>> >> Agree. It should be separate maven module (and patch puts it as
>> separate
>> >> maven module now). And top level for hadoop tools is nice to have, but
>> it
>> >> becomes hard to maintain until patch automation tests run the tests
>> under
>> >> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>> >>
>> >> I propose we can have something like the following:
>> >>
>> >> trunk/
>> >>  - hadoop-mapreduce
>> >>      - hadoop-mr-client
>> >>      - hadoop-yarn
>> >>      - hadoop-tools
>> >>          - hadoop-streaming
>> >>          - hadoop-archives
>> >>          - hadoop-distcp
>> >>
>> >> Thoughts?
>> >>
>> >> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>> >> complete rewrite and did not want to remove it until users are
>> familiarized
>> >> with new one.
>> >>
>> >> On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:
>> >>
>> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>> >> in there as well - ie tools that are downstream of MR and/or HDFS.
>> >>
>> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>> mahadev@hortonworks.com>
>> >> wrote:
>> >> > +1 for a seperate module in hadoop-mapreduce-project. I think
>> >> > hadoop-mapreduce-client might not be right place for it. We might
>> have
>> >> > to pick a new maven module under hadoop-mapreduce-project that could
>> >> > host streaming/distcp/hadoop archives.
>> >> >
>> >> > thanks
>> >> > mahadev
>> >> >
>> >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>> tucu@cloudera.com>
>> >> wrote:
>> >> >> Agree, it should be a separate maven module.
>> >> >>
>> >> >> And it should be under hadoop-mapreduce-client, right?
>> >> >>
>> >> >> And now that we are in the topic, the same should go for streaming,
>> no?
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >> Alejandro
>> >> >>
>> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
>> >> wrote:
>> >> >>
>> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
>> >> wrote:
>> >> >>> > Nice work!   I definitely think this should go in 23
and 20x.
>> >> >>> >
>> >> >>> > Agree with JD that it should be in the core code, not
contrib.
>>  If
>> >> >>> > it's going to be maintained then we should put it in the
core
>> code.
>> >> >>>
>> >> >>> Now that we're all mavenized, though, a separate maven module
and
>> >> >>> artifact does make sense IMO - ie "hadoop jar
>> >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>> >> >>>
>> >> >>> -Todd
>> >> >>> --
>> >> >>> Todd Lipcon
>> >> >>> Software Engineer, Cloudera
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Todd Lipcon
>> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> -Giri
>>
>>
>>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message