hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: DistCpV2 in 0.23
Date Fri, 26 Aug 2011 16:37:41 GMT
Agree with Mithun and Robert. DistCp and Tools restructuring are separate tasks. Since DistCp
code is ready to be committed, it need not wait for the Tools separation from MR/HDFS.
I would say it can go into contrib as the patch is now, and when the tools restructuring happens
it would be just an svn mv.  If there are no issues with this proposal I can commit the code
tomorrow.

Thanks
Amareshwari

On 8/26/11 7:45 PM, "Robert Evans" <evans@yahoo-inc.com> wrote:

I agree with Mithun.  They are related but this goes beyond distcpv2 and should not block
distcpv2 from going in.  It would be very nice, however, to get the layout settled soon so
that we all know where to find something when we want to work on it.

Also +1 for Alejandro's I also prefer to keep tools at the trunk level.

Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate modules right
now, there is still tight coupling between the different pieces, especially with tests.  IMO
until we can reduce that coupling we should treat building and testing Hadoop as a single
project instead of trying to keep them separate.

--Bobby

On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mithun.radhakrishnan@yahoo.com> wrote:

Would it be acceptable if retooling of tools/ were taken up separately? It sounds to me like
this might be a distinct (albeit related) task.

Mithun


________________________________
From: Giridharan Kesavan <gkesavan@hortonworks.com>
To: mapreduce-dev@hadoop.apache.org
Sent: Friday, August 26, 2011 12:04 PM
Subject: Re: DistCpV2 in 0.23

+1 to Alejandro's

I prefer to keep the hadoop-tools at trunk level.

-Giri

On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
> I'd suggest putting hadoop-tools either at trunk/ level or having a a tools
> aggregator module for hdfs and other for common.
>
> I personal would prefer at trunk/.
>
> Thanks.
>
> Alejandro
>
> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>> Agree. It should be separate maven module (and patch puts it as separate
>> maven module now). And top level for hadoop tools is nice to have, but it
>> becomes hard to maintain until patch automation tests run the tests under
>> tools. Currently we see many times the changes in HDFS effecting RAID tests
>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>
>> I propose we can have something like the following:
>>
>> trunk/
>>  - hadoop-mapreduce
>>      - hadoop-mr-client
>>      - hadoop-yarn
>>      - hadoop-tools
>>          - hadoop-streaming
>>          - hadoop-archives
>>          - hadoop-distcp
>>
>> Thoughts?
>>
>> @Eli and @JD, we did not replace old legacy distcp because this is really a
>> complete rewrite and did not want to remove it until users are familiarized
>> with new one.
>>
>> On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:
>>
>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>
>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <mahadev@hortonworks.com>
>> wrote:
>> > +1 for a seperate module in hadoop-mapreduce-project. I think
>> > hadoop-mapreduce-client might not be right place for it. We might have
>> > to pick a new maven module under hadoop-mapreduce-project that could
>> > host streaming/distcp/hadoop archives.
>> >
>> > thanks
>> > mahadev
>> >
>> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <tucu@cloudera.com>
>> wrote:
>> >> Agree, it should be a separate maven module.
>> >>
>> >> And it should be under hadoop-mapreduce-client, right?
>> >>
>> >> And now that we are in the topic, the same should go for streaming, no?
>> >>
>> >> Thanks.
>> >>
>> >> Alejandro
>> >>
>> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com>
>> wrote:
>> >>
>> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com>
>> wrote:
>> >>> > Nice work!   I definitely think this should go in 23 and 20x.
>> >>> >
>> >>> > Agree with JD that it should be in the core code, not contrib.
 If
>> >>> > it's going to be maintained then we should put it in the core code.
>> >>>
>> >>> Now that we're all mavenized, though, a separate maven module and
>> >>> artifact does make sense IMO - ie "hadoop jar
>> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>> >>>
>> >>> -Todd
>> >>> --
>> >>> Todd Lipcon
>> >>> Software Engineer, Cloudera
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>



--
-Giri



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message