hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
Date Fri, 14 Jun 2013 17:42:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683579#comment-13683579
] 

Siddharth Seth commented on YARN-802:
-------------------------------------

With YARN, a new AM (Application) is started per job. The initApp in the NM is per app - so
each job/app can choose which shuffle provider it wants to use. The shuffle service configured
for an AM will be specific to a single job only.
>From MAPREDUCE-4049
bq.  A shuffle consumer instance will only contact one of the shuffle providers and will request
its desired files only from from this provider.

I'm assuming a single job will only use one shuffle provider - or do you see a situation where
multiple shuffle providers can serve data to a single job ?

In case of multiple jobs being run by a single AM - this gets more complicated, and we may
need to initialize multiple providers.
                
> APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-802
>                 URL: https://issues.apache.org/jira/browse/YARN-802
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications, nodemanager
>    Affects Versions: 2.0.4-alpha
>            Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler.
 This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT
enables the AuxiliaryService to map jobId->userId. This is needed for properly finding
the MOFs of a job per reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded
expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put
(ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService.
As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party
AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of
them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  "APPLICATION_STOP is
never sent to AuxServices".  This means that in case the 'handle' method gets APPLICATION_INIT
event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed patch
for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message