hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1216) Hadoop should support reduce none option
Date Fri, 06 Apr 2007 16:52:32 GMT
Hadoop should support reduce none option

                 Key: HADOOP-1216
                 URL: https://issues.apache.org/jira/browse/HADOOP-1216
             Project: Hadoop
          Issue Type: New Feature
          Components: mapred
            Reporter: Runping Qi

This has been a highly desired feature in streaming world and was asked occationally in the
non-streaming side.
Streaming implemented a working (hacking) solution. But it also generates discrepency between
streaming/non-streaming model. It would be nice if Hadoop offers such a feature 
that works both streaming and non-streaming. Owen and I discussed this a bit and here is the

general idea for further discussions/suggestions:

1. Allows the user to specify reducer=none in jobconf. 
2. The user still can specify output format and output directory
3. Each mapper will generate an output file in the specified directory. The naming convention
can still be like part-xxxxxxxx
where xxxxxxxx is the map task number.
4. The mapoutput collector of a mapper task will be a record writer on the 
5. The mapper will call output.collect() to write the output, thus the same mapper class can
used, regardless reducer none is set or not.

When reducer is set to none for a job, there will be no mapoutput files writen on to local
file system at all, 
and no data shuffling between mappers and reducers. As a mapper of fact, the framework may
not to create reducers at all.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message