hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
Date Fri, 14 Jun 2013 17:42:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683579#comment-13683579

Siddharth Seth commented on YARN-802:

With YARN, a new AM (Application) is started per job. The initApp in the NM is per app - so
each job/app can choose which shuffle provider it wants to use. The shuffle service configured
for an AM will be specific to a single job only.
>From MAPREDUCE-4049
bq.  A shuffle consumer instance will only contact one of the shuffle providers and will request
its desired files only from from this provider.

I'm assuming a single job will only use one shuffle provider - or do you see a situation where
multiple shuffle providers can serve data to a single job ?

In case of multiple jobs being run by a single AM - this gets more complicated, and we may
need to initialize multiple providers.
> APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
> -----------------------------------------------------------------------------------
>                 Key: YARN-802
>                 URL: https://issues.apache.org/jira/browse/YARN-802
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications, nodemanager
>    Affects Versions: 2.0.4-alpha
>            Reporter: Avner BenHanoch
> APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler.
 This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT
enables the AuxiliaryService to map jobId->userId. This is needed for properly finding
the MOFs of a job per reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded
expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put
(ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService.
As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party
AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of
them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  "APPLICATION_STOP is
never sent to AuxServices".  This means that in case the 'handle' method gets APPLICATION_INIT
event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed patch
for any option that people like.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message