hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: yarn-site.xml and aux-services
Date Fri, 23 Aug 2013 17:47:02 GMT
Harsh,

Thanks for the clarification.  I would find it very convenient in this case to have my custom
jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache
those to local disk.

What about having the tasks themselves start the per-node service as a child process?   I've
been told that the NM kills the process group, but won't setgrp() circumvent that?  

Even given that, would the child process of one task have proper environment and permission
to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the
persistent service serves up mapper output files to the reducers across the network:
1) AM spawns "mapper-like" tasks around the cluster
2) Each mapper-like task on a given node launches a "persistent service" child, but only if
one is not already running.
3) Each mapper-like task writes one or more output files, and informs the service of those
files (along with AM-id, Task-id etc).
4) AM spawns "reducer-like" tasks around the cluster.
5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to
services on those nodes to read the data.

There are some details missing, like how the lifetime of the temporary files is controlled
to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how
the reducer-like tasks are informed of which nodes have data.

John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, August 23, 2013 11:00 AM
To: <user@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

The general practice is to install your deps into a custom location such as /opt/john-jars,
and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the
aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents
across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml
indicating the location plus class to load. Similar to HBase co-processors. But I'll defer
to Vinod on if this would be a good thing to do.

(I know the right next thing with such an ability people will ask for is hot-code-upgrades...)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <john.lilley@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a 
> stock Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used 
> to distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, 
> they have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
> <john.lilley@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for 
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is 
> just sitting there in the right place, but if one wanted to make a 
> whole new aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate 
> to waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <user@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is 
> the ID the applications may lookup in their container responses map we 
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like 
> the below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <john.lilley@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the 
>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager 
>> via RPC?  Is there an interface to implement?  How are they opened 
>> and closed with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <user@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your 
>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>> (and NM has to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>> <john.lilley@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce 
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to 
>>> run a per-node service that exists during the lifetime of the 
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>> www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.



--
Harsh J

Mime
View raw message