Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C520702F for ; Mon, 29 Aug 2011 18:40:55 +0000 (UTC) Received: (qmail 45213 invoked by uid 500); 29 Aug 2011 18:40:54 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 45079 invoked by uid 500); 29 Aug 2011 18:40:54 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 45065 invoked by uid 99); 29 Aug 2011 18:40:53 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2011 18:40:53 +0000 Received: from localhost (HELO awittena-md.linkedin.biz) (127.0.0.1) (smtp-auth username aw, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2011 18:40:53 +0000 Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Allen Wittenauer In-Reply-To: Date: Mon, 29 Aug 2011 11:40:50 -0700 Cc: Mithun Radhakrishnan Content-Transfer-Encoding: quoted-printable Message-Id: References: To: X-Mailer: Apple Mail (2.1082) I have a feeling this discussion should get moved to common-dev or even = to general. My #1 question is if tools is basically contrib reborn. If not, what = makes it different? On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > Some questions on making hadoop-tools top level under trunk, >=20 > 1. Should the patches for tools be created against Hadoop Common? > 2. What will happen to the tools test automation? Will it run as part = of Hadoop Common tests? > 3. Will it introduce a dependency from MapReduce to Common? Or is = this taken care in Mavenization? >=20 >=20 > Thanks > Amareshwari >=20 > On 8/26/11 10:17 PM, "Alejandro Abdelnur" wrote: >=20 > Please, don't add more Mavenization work on us (eventually I want to = go back > to coding) >=20 > Given that Hadoop is already Mavenized, the patch should be Mavenized. >=20 > What will have to be done extra (besides Mavenizing distcp) is to = create a > hadoop-tools module at root level and within it a hadoop-distcp = module. >=20 > The hadoop-tools POM will look pretty much like the = hadoop-common-project > POM. >=20 > The hadoop-distcp POM should follow the hadoop-common POM patterns. >=20 > Thanks. >=20 > Alejandro >=20 > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > amarsri@yahoo-inc.com> wrote: >=20 >> Agree with Mithun and Robert. DistCp and Tools restructuring are = separate >> tasks. Since DistCp code is ready to be committed, it need not wait = for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the = tools >> restructuring happens it would be just an svn mv. If there are no = issues >> with this proposal I can commit the code tomorrow. >>=20 >> Thanks >> Amareshwari >>=20 >> On 8/26/11 7:45 PM, "Robert Evans" wrote: >>=20 >> I agree with Mithun. They are related but this goes beyond distcpv2 = and >> should not block distcpv2 from going in. It would be very nice, = however, to >> get the layout settled soon so that we all know where to find = something when >> we want to work on it. >>=20 >> Also +1 for Alejandro's I also prefer to keep tools at the trunk = level. >>=20 >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are = separate >> modules right now, there is still tight coupling between the = different >> pieces, especially with tests. IMO until we can reduce that coupling = we >> should treat building and testing Hadoop as a single project instead = of >> trying to keep them separate. >>=20 >> --Bobby >>=20 >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" = >> wrote: >>=20 >> Would it be acceptable if retooling of tools/ were taken up = separately? It >> sounds to me like this might be a distinct (albeit related) task. >>=20 >> Mithun >>=20 >>=20 >> ________________________________ >> From: Giridharan Kesavan >> To: mapreduce-dev@hadoop.apache.org >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >>=20 >> +1 to Alejandro's >>=20 >> I prefer to keep the hadoop-tools at trunk level. >>=20 >> -Giri >>=20 >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur = >> wrote: >>> I'd suggest putting hadoop-tools either at trunk/ level or having a = a >> tools >>> aggregator module for hdfs and other for common. >>>=20 >>> I personal would prefer at trunk/. >>>=20 >>> Thanks. >>>=20 >>> Alejandro >>>=20 >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >>> amarsri@yahoo-inc.com> wrote: >>>=20 >>>> Agree. It should be separate maven module (and patch puts it as = separate >>>> maven module now). And top level for hadoop tools is nice to have, = but >> it >>>> becomes hard to maintain until patch automation tests run the tests >> under >>>> tools. Currently we see many times the changes in HDFS effecting = RAID >> tests >>>> in MapReduce. So, I'm fine putting the tools under = hadoop-mapreduce. >>>>=20 >>>> I propose we can have something like the following: >>>>=20 >>>> trunk/ >>>> - hadoop-mapreduce >>>> - hadoop-mr-client >>>> - hadoop-yarn >>>> - hadoop-tools >>>> - hadoop-streaming >>>> - hadoop-archives >>>> - hadoop-distcp >>>>=20 >>>> Thoughts? >>>>=20 >>>> @Eli and @JD, we did not replace old legacy distcp because this is >> really a >>>> complete rewrite and did not want to remove it until users are >> familiarized >>>> with new one. >>>>=20 >>>> On 8/26/11 12:51 AM, "Todd Lipcon" wrote: >>>>=20 >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could = go >>>> in there as well - ie tools that are downstream of MR and/or HDFS. >>>>=20 >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar < >> mahadev@hortonworks.com> >>>> wrote: >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think >>>>> hadoop-mapreduce-client might not be right place for it. We might = have >>>>> to pick a new maven module under hadoop-mapreduce-project that = could >>>>> host streaming/distcp/hadoop archives. >>>>>=20 >>>>> thanks >>>>> mahadev >>>>>=20 >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur < >> tucu@cloudera.com> >>>> wrote: >>>>>> Agree, it should be a separate maven module. >>>>>>=20 >>>>>> And it should be under hadoop-mapreduce-client, right? >>>>>>=20 >>>>>> And now that we are in the topic, the same should go for = streaming, >> no? >>>>>>=20 >>>>>> Thanks. >>>>>>=20 >>>>>> Alejandro >>>>>>=20 >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon >>>> wrote: >>>>>>=20 >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins >>>> wrote: >>>>>>>> Nice work! I definitely think this should go in 23 and 20x. >>>>>>>>=20 >>>>>>>> Agree with JD that it should be in the core code, not contrib. = If >>>>>>>> it's going to be maintained then we should put it in the core >> code. >>>>>>>=20 >>>>>>> Now that we're all mavenized, though, a separate maven module = and >>>>>>> artifact does make sense IMO - ie "hadoop jar >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>>>>>>=20 >>>>>>> -Todd >>>>>>> -- >>>>>>> Todd Lipcon >>>>>>> Software Engineer, Cloudera >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>>=20 >>>>=20 >>>=20 >>=20 >>=20 >>=20 >> -- >> -Giri >>=20 >>=20 >>=20 >=20