hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avner BenHanoch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5329) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
Date Sun, 29 Sep 2013 13:26:26 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781364#comment-13781364
] 

Avner BenHanoch commented on MAPREDUCE-5329:
--------------------------------------------

Hi Siddharth,

All sounds great!  Thank you.

One very last comment, regarding *mapreduce.job.shuffle.provider.plugin.classes*, can we change
it to be *mapreduce.job.shuffle.provider.services* and list the *service names* instead of
the *service classes*.  This will prevent confusion in the config name of hadoop-1 vs. hadoop-2
which is good for me.  Also, this will be good for your concern about services that are not
running on the specific NM.

In this case the code will simply be something like this:
{code}
// add external shuffle-providers - if any
Collection<String> shuffleProviders = conf.getStringCollection(
    MRJobConfig.SHUFFLE_PROVIDER_SERVICES);

for (final String provider : shuffleProviders) {
  if (provider.equals(ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID)) {
    continue; // skip built-in shuffle-provider that was already inserted with shuffle secret
key
  }
  LOG.info("Adding " + provider + " to serviceData");
  // Please note, the shuffleProvider needs to be able to work with the host:port information
provided
  // by the AM (i.e. plugins which require custom location / other configuration are not supported)
  serviceData.put(provider, ByteBuffer.allocate(0)); // This only serves for INIT_APP notifications
}
{code}


> APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5329
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5329
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.1.0-beta, 2.0.6-alpha
>            Reporter: Avner BenHanoch
>             Fix For: trunk
>
>         Attachments: MAPREDUCE-5329.patch
>
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler.
 This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT
enables the AuxiliaryService to map jobId->userId. This is needed for properly finding
the MOFs of a job per reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded
expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put
(ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService.
As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party
AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of
them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  "APPLICATION_STOP is
never sent to AuxServices".  This means that in case the 'handle' method gets APPLICATION_INIT
event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed patch
for any option that people like.
> See [Pluggable Shuffle in Hadoop documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message