hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul sutter (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-113) Allow multiple Output Dirs to be specified for a job
Date Sat, 08 Apr 2006 22:23:25 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-113?page=comments#action_12373749 ] 

paul sutter commented on HADOOP-113:

Is the intention to have one mapper fork into multiple reducers, saving the file io of doing
independent map passes?

mapper1 -> output a -> reducer 1
                 -> output b -> reducer 2 

instead of

mapper1 -> output a -> reducer 1
mapper 2 -> output b -> reducer 2

wherein the second example, the map input file is read twice instead of once?

that could be useful. i not sure how much it would really speed things up.

> Allow multiple Output Dirs to be specified for a job
> ----------------------------------------------------
>          Key: HADOOP-113
>          URL: http://issues.apache.org/jira/browse/HADOOP-113
>      Project: Hadoop
>         Type: New Feature

>   Components: mapred
>     Versions: 0.1.0
>     Reporter: Rod Taylor
>  Attachments: hadoop_multisegment.patch
> Allow a single job to create multiple outputs. 2 additional simple functions only
> This allows for more complex branching of the process to occur either with multiple steps
of the same type or allow different actions to take place on each output directory depending
on the required actions.
> For my specific use, it allows me to run multiple Generate Outputs instead of a single
Generate Output as submitted in NUTCH-171(http://issues.apache.org/jira/browse/NUTCH-171)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message