hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4868) Allow multiple iteration for map
Date Tue, 09 Sep 2014 20:19:29 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer updated MAPREDUCE-4868:
    Fix Version/s:     (was: 2.4.0)
                       (was: 3.0.0)

> Allow multiple iteration for map
> --------------------------------
>                 Key: MAPREDUCE-4868
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 3.0.0, 2.0.3-alpha
>            Reporter: Jerry Chen
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Currently, the Mapper class allows advanced users to override "public void run(Context
context)" method for more control over the execution of the mapper, while Context interface
limit the operations over the data which is the foundation of "more control".
> One of use cases is that when I am considering a hive optimziation problem, I want to
go two passes over the input data instead of using a another job or task ( which may slower
the whole process). Each pass do the same thing but with a different parameters.
> This is a new paradigm of Map Reduce usage and can be archived easily by extend Context
interface a little with the more control over the data such as reset the input.

This message was sent by Atlassian JIRA

View raw message