hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback
Date Wed, 23 Jul 2014 00:24:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071174#comment-14071174

Chris Douglas commented on MAPREDUCE-5974:

bq. Doing fallback as the records are emitted would be pretty neat, but may also be somewhat
difficult. [snip]

*nod* Fair enough, though if each MapTask is making independent decisions about the collector,
they still need to agree on the format for the shuffle. Spilling one collector to disk and
changing strategies should be compatible, assuming there isn't a different format for intermediate
spills. But yeah, this is very abstract, given the use cases we have.

If the goal is to support a fallback collector when native libs aren't available; given the
dependency on intermediate format, should the swap be internal to the native collector, even
in init? If the interface were like the serialization, then one might use the keytype, etc.
to pick the most-appropriate collector. As failover, I'm struggling to come up with a case
that's not covered by making this an internal detail of the native collector.

> Allow map output collector fallback
> -----------------------------------
>                 Key: MAPREDUCE-5974
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 2.6.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-5974.txt
> Currently we only allow specifying a single MapOutputCollector implementation class in
a job. It would be nice to allow a comma-separated list of classes: we should try each collector
implementation in the user-specified order until we find one that can be successfully instantiated
and initted.
> This is useful for cases where a particular optimized collector implementation cannot
operate on all key/value types, or requires native code. The cluster administrator can configure
the cluster to try to use the optimized collector and fall back to the default collector.

This message was sent by Atlassian JIRA

View raw message