hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes Zillmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
Date Tue, 04 Sep 2012 09:50:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447584#comment-13447584

Johannes Zillmann commented on MAPREDUCE-207:

Currently in our hadoop applications we calculate the splits before we submit it to the client
(then the client simply looks up the existing splits). We do that mainly to influence the
reducer count base on the number of splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would be nice to
have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after we know the
splits (different join strategies for different data constellations).

Just dumping some ideas here...

> Computing Input Splits on the MR Cluster
> ----------------------------------------
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Philip Zeyliger
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-207.patch
> Instead of computing the input splits as part of job submission, Hadoop could have a
separate "job task type" that computes the input splits, therefore allowing that computation
to happen on the cluster.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message