hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erez Katz <erez_k...@yahoo.com>
Subject Re: Partitioning Reducer Output
Date Mon, 05 Apr 2010 15:45:13 GMT
A partitioner can be used to control how keys are distributed across reducers (overriding the
default 
hash(key)%num_of_reducers behavior)

I think Rakesh is asking about having multiple "types" of output from a single map-reduce
application.

Each reducer has a tmp work directory on hdfs (pointed by jobconf by mapred.work.output.dir
or as env var "mapred_work_output_dir if it is a streaming app).
The content of that folder of a reducer that completed successfully is moved to the actual
output folder of the task.

A reducer can create other files on that folder and provided that there are no name collisions
between reducer (meaning if the reducer number is appended to the file name), then one can
have the output folder contain multiple types of outputs , something like

part-00000
part-00001
part-00002
otherType-00000
otherType-00001
otherType-00002

and later on these files can be moved around to other folders...

hope it helps,

  Erez Katz


--- On Mon, 4/5/10, David Rosenstrauch <darose@darose.net> wrote:

From: David Rosenstrauch <darose@darose.net>
Subject: Re: Partitioning Reducer Output
To: mapreduce-user@hadoop.apache.org
Date: Monday, April 5, 2010, 7:35 AM

On 04/02/2010 08:32 PM, rakesh kothari wrote:
>
> Hi,
>
> What's the best way to partition data generated from Reducer into multiple =
> directories in Hadoop 0.20.1. I was thinking of using MultipleTextOutputFor=
> mat but that's not backward compatible with other API's in this version of =
> hadoop.
>
> Thanks,
> -Rakesh                         

Use a partitioner?

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Job.html#setPartitionerClass%28java.lang.Class%29

HTH,

DR



      
Mime
View raw message