hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Armstrong <j...@ccri.com>
Subject Re: Do I have to sort?
Date Mon, 18 Jun 2012 14:53:29 GMT
On 06/18/2012 10:40 AM, Mark Kerzner wrote:
> that sounds very interesting, and I may implement such a workflow, but
> can I write back to HDFS in the mapper? In the reducer it is a standard
> context.write(), but it is a different context.

Both Mapper.Context and Reducer.Context descend from 
TaskInputOutputContext, which is where the write() method is defined, so 
they're both outputting their data in the same way.

If you don't have a Reducer -- only Mappers and fully parallel data 
processing -- then when you configure your job you set the number of 
reducers to zero.  Then the mapper context knows that mapper output is 
the last step, so it uses the specified OutputFormat to write out the 
data, just like your reducer context currently does with reducer output.

View raw message