hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: DistCpV2 in 0.23
Date Fri, 26 Aug 2011 04:06:06 GMT
Agree. It should be separate maven module (and patch puts it as separate maven module now).
And top level for hadoop tools is nice to have, but it becomes hard to maintain until patch
automation tests run the tests under tools. Currently we see many times the changes in HDFS
effecting RAID tests in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.

I propose we can have something like the following:

  - hadoop-mapreduce
      - hadoop-mr-client
      - hadoop-yarn
      - hadoop-tools
          - hadoop-streaming
          - hadoop-archives
          - hadoop-distcp


@Eli and @JD, we did not replace old legacy distcp because this is really a complete rewrite
and did not want to remove it until users are familiarized with new one.

On 8/26/11 12:51 AM, "Todd Lipcon" <todd@cloudera.com> wrote:

Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
in there as well - ie tools that are downstream of MR and/or HDFS.

On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <mahadev@hortonworks.com> wrote:
> +1 for a seperate module in hadoop-mapreduce-project. I think
> hadoop-mapreduce-client might not be right place for it. We might have
> to pick a new maven module under hadoop-mapreduce-project that could
> host streaming/distcp/hadoop archives.
> thanks
> mahadev
> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
>> Agree, it should be a separate maven module.
>> And it should be under hadoop-mapreduce-client, right?
>> And now that we are in the topic, the same should go for streaming, no?
>> Thanks.
>> Alejandro
>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <todd@cloudera.com> wrote:
>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <eli@cloudera.com> wrote:
>>> > Nice work!   I definitely think this should go in 23 and 20x.
>>> >
>>> > Agree with JD that it should be in the core code, not contrib.  If
>>> > it's going to be maintained then we should put it in the core code.
>>> Now that we're all mavenized, though, a separate maven module and
>>> artifact does make sense IMO - ie "hadoop jar
>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>> -Todd
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message