hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1216) Hadoop should support reduce none option
Date Thu, 26 Apr 2007 08:20:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491911
] 

Hadoop QA commented on HADOOP-1216:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356300/patch_1216.txt applied and successfully
tested against trunk revision r532083.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/84/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/84/console

> Hadoop should support reduce none option
> ----------------------------------------
>
>                 Key: HADOOP-1216
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1216
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: patch_1216.txt
>
>
> This has been a highly desired feature in streaming world and was asked occationally
in the non-streaming side.
> Streaming implemented a working (hacking) solution. But it also generates discrepency
between hadoop 
> streaming/non-streaming model. It would be nice if Hadoop offers such a feature 
> that works both streaming and non-streaming. Owen and I discussed this a bit and here
is the 
> general idea for further discussions/suggestions:
> 1. Allows the user to specify reducer=none in jobconf. 
> 2. The user still can specify output format and output directory
> 3. Each mapper will generate an output file in the specified directory. The naming convention
can still be like part-xxxxxxxx
> where xxxxxxxx is the map task number.
> 4. The mapoutput collector of a mapper task will be a record writer on the 
> 5. The mapper will call output.collect() to write the output, thus the same mapper class
can be 
> used, regardless reducer none is set or not.
> When reducer is set to none for a job, there will be no mapoutput files writen on to
local file system at all, 
> and no data shuffling between mappers and reducers. As a mapper of fact, the framework
may choose 
> not to create reducers at all.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message