Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20B0310030 for ; Wed, 4 Sep 2013 06:06:23 +0000 (UTC) Received: (qmail 24094 invoked by uid 500); 4 Sep 2013 06:06:16 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 23986 invoked by uid 500); 4 Sep 2013 06:06:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 23977 invoked by uid 99); 4 Sep 2013 06:06:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 06:06:14 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.214.179 as permitted sender) Received: from [209.85.214.179] (HELO mail-ob0-f179.google.com) (209.85.214.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 06:06:09 +0000 Received: by mail-ob0-f179.google.com with SMTP id fb19so6779101obc.38 for ; Tue, 03 Sep 2013 23:05:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=MZ4s5KPdR2gScteZLOj9KIff1F/PZWujLBObOgux3hM=; b=BdxZNvl+tuma+gqSIwtiVcLzeVqzbpi2G1YD7e+N+jbg2Uu4rMpZEGKsL/Wjbk6iFj PAicYyC+148Ba0ihYlV7Unld82ux+Z8T5cqVei7mfTQCZesCAM/9P80tGq9XBz8oEjkN 36mT+Kq3S3emx8hmDR5bHakkCs/O3kvD6+yr7U/ukCWwChq2jUGkcdivIdgxslMH/gQr 2pyhKSVaYiJcg0gPESeZ1rdUtUBo9PXTJxOwIJ1zjWGjQRzN+ZV23dCDVNlzbnxaZS9Y VkPwnBrdj9U7C0hhggGfEEezhODAWLph8KJXzEKolyfocTRxXfQoWujpSLMSzZ25OXzD WeHg== X-Gm-Message-State: ALoCoQlvhxXuVmfeVG73unDcO7+lwMp2c6tBvXXsS+9jMpGA3J3elTXDBjSE/YMqO4ru8UsmLRX3 X-Received: by 10.182.50.200 with SMTP id e8mr923801obo.35.1378274749135; Tue, 03 Sep 2013 23:05:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.95.105 with HTTP; Tue, 3 Sep 2013 23:05:29 -0700 (PDT) In-Reply-To: <869970D71E26D7498BDAC4E1CA92226B7F9D943E@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B658BB2EF@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B658BC25E@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B658BC65C@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B7F9D81D7@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B7F9D92ED@MBX021-E3-NJ-2.exch021.domain.local> <869970D71E26D7498BDAC4E1CA92226B7F9D943E@MBX021-E3-NJ-2.exch021.domain.local> From: Harsh J Date: Wed, 4 Sep 2013 11:35:29 +0530 Message-ID: Subject: Re: yarn-site.xml and aux-services To: "" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > Thanks for the clarification. I would find it very convenient in this ca= se to have my custom jars available in HDFS, but I can see the added comple= xity needed for YARN to maintain cache those to local disk. We could class-load directly from HDFS, like HBase Co-Processors do. > Consider a scenario analogous to the MR shuffle, where the persistent ser= vice serves up mapper output files to the reducers across the network: Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic. On Fri, Aug 23, 2013 at 11:17 PM, John Lilley wr= ote: > Harsh, > > Thanks for the clarification. I would find it very convenient in this ca= se to have my custom jars available in HDFS, but I can see the added comple= xity needed for YARN to maintain cache those to local disk. > > What about having the tasks themselves start the per-node service as a ch= ild process? I've been told that the NM kills the process group, but won'= t setgrp() circumvent that? > > Even given that, would the child process of one task have proper environm= ent and permission to act on behalf of other tasks? Consider a scenario an= alogous to the MR shuffle, where the persistent service serves up mapper ou= tput files to the reducers across the network: > 1) AM spawns "mapper-like" tasks around the cluster > 2) Each mapper-like task on a given node launches a "persistent service" = child, but only if one is not already running. > 3) Each mapper-like task writes one or more output files, and informs the= service of those files (along with AM-id, Task-id etc). > 4) AM spawns "reducer-like" tasks around the cluster. > 5) Each reducer-like task is told which nodes contain "mapper" result dat= a, and connects to services on those nodes to read the data. > > There are some details missing, like how the lifetime of the temporary fi= les is controlled to extend beyond the mapper-like task lifetime but still = be cleaned up on AM exit, and how the reducer-like tasks are informed of wh= ich nodes have data. > > John > > > -----Original Message----- > From: Harsh J [mailto:harsh@cloudera.com] > Sent: Friday, August 23, 2013 11:00 AM > To: > Subject: Re: yarn-site.xml and aux-services > > The general practice is to install your deps into a custom location such = as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while als= o configuring the classes under the aux-services list. You need to take car= e of deploying jar versions to /opt/john-jars/ contents across the cluster = though. > > I think it may be a neat idea to have jars be placed on HDFS or any other= DFS, and the yarn-site.xml indicating the location plus class to load. Sim= ilar to HBase co-processors. But I'll defer to Vinod on if this would be a = good thing to do. > > (I know the right next thing with such an ability people will ask for is = hot-code-upgrades...) > > On Fri, Aug 23, 2013 at 10:11 PM, John Lilley = wrote: >> Are there recommended conventions for adding additional code to a >> stock Hadoop install? >> >> It would be nice if we could piggyback on whatever mechanisms are used >> to distribute hadoop itself around the cluster. >> >> john >> >> >> >> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] >> Sent: Thursday, August 22, 2013 6:25 PM >> >> >> To: user@hadoop.apache.org >> Subject: Re: yarn-site.xml and aux-services >> >> >> >> >> >> Auxiliary services are essentially administer-configured services. So, >> they have to be set up at install time - before NM is started. >> >> >> >> +Vinod >> >> >> >> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley >> >> wrote: >> >> Following up on this, how exactly does one *install* the jar(s) for >> auxiliary service? Can it be shipped out with the LocalResources of an = AM? >> MapReduce's aux-service is presumably installed with Hadoop and is >> just sitting there in the right place, but if one wanted to make a >> whole new aux-service that belonged with an AM, how would one do it? >> >> John >> >> >> -----Original Message----- >> From: John Lilley [mailto:john.lilley@redpoint.net] >> Sent: Wednesday, June 05, 2013 11:41 AM >> To: user@hadoop.apache.org >> Subject: RE: yarn-site.xml and aux-services >> >> Wow, thanks. Is this documented anywhere other than the code? I hate >> to waste y'alls time on things that can be RTFMed. >> John >> >> >> -----Original Message----- >> From: Harsh J [mailto:harsh@cloudera.com] >> Sent: Wednesday, June 05, 2013 9:35 AM >> To: >> Subject: Re: yarn-site.xml and aux-services >> >> John, >> >> The format is ID and sub-config based: >> >> First, you define an ID as a service, like the string "foo". This is >> the ID the applications may lookup in their container responses map we >> discussed over another thread (around shuffle handler). >> >> >> yarn.nodemanager.aux-services >> foo >> >> >> Then you define an actual implementation class for that ID "foo", like s= o: >> >> >> yarn.nodemanager.aux-services.foo.class >> com.mypack.MyAuxServiceClassForFoo >> >> >> If you have multiple services foo and bar, then it would appear like >> the below (comma separated IDs and individual configs): >> >> >> yarn.nodemanager.aux-services >> foo,bar >> >> >> yarn.nodemanager.aux-services.foo.class >> com.mypack.MyAuxServiceClassForFoo >> >> >> yarn.nodemanager.aux-services.bar.class >> com.mypack.MyAuxServiceClassForBar >> >> >> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley >> wrote: >>> Good, I was hoping that would be the case. But what are the >>> mechanics of it? Do I just add another entry? And what exactly is "ma= dreduce.shuffle"? >>> A scoped class name? Or a key string into some map elsewhere? >>> >>> e.g. like: >>> >>> >>> yarn.nodemanager.aux-services >>> mapreduce.shuffle >>> >>> >>> yarn.nodemanager.aux-services >>> myauxserviceclassname >>> >>> >>> Concerning auxiliary services -- do they communicate with NodeManager >>> via RPC? Is there an interface to implement? How are they opened >>> and closed with NodeManager? >>> >>> Thanks >>> John >>> >>> -----Original Message----- >>> From: Harsh J [mailto:harsh@cloudera.com] >>> Sent: Tuesday, June 04, 2013 11:58 PM >>> To: >>> Subject: Re: yarn-site.xml and aux-services >>> >>> Yes, thats what this is for. You can implement, pass in and use your >>> own AuxService. It needs to be on the NodeManager CLASSPATH to run >>> (and NM has to be restarted to apply). >>> >>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley >>> >>> wrote: >>>> I notice the yarn-site.xml >>>> >>>> >>>> >>>> >>>> >>>> yarn.nodemanager.aux-services >>>> >>>> mapreduce.shuffle >>>> >>>> shuffle service that needs to be set for Map Reduce >>>> to run >>>> >>>> >>>> >>>> >>>> >>>> Is this a general-purpose hook? >>>> >>>> Can I tell yarn to run *my* per-node service? >>>> >>>> Is there some other way (within the recommended Hadoop framework) to >>>> run a per-node service that exists during the lifetime of the >>>> NodeManager? >>>> >>>> >>>> >>>> John Lilley >>>> >>>> Chief Architect, RedPoint Global Inc. >>>> >>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302 >>>> >>>> T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077 >>>> >>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | >>>> www.redpoint.net >>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >> >> >> >> -- >> Harsh J >> >> >> >> >> -- >> +Vinod >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or >> entity to which it is addressed and may contain information that is >> confidential, privileged and exempt from disclosure under applicable >> law. If the reader of this message is not the intended recipient, you >> are hereby notified that any printing, copying, dissemination, >> distribution, disclosure or forwarding of this communication is >> strictly prohibited. If you have received this communication in error, >> please contact the sender immediately and delete it from your system. Th= ank You. > > > > -- > Harsh J --=20 Harsh J