hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rakesh kothari <rkothari_...@hotmail.com>
Subject RE: Partitioning Reducer Output
Date Mon, 05 Apr 2010 23:04:12 GMT

Thanks for the insights.

My use case is more around sending the reducer output to subdirectories representing date
partitions.

For example if the base reducer output directory is /hdfs/root/reducer/ and if there are two
records encountered by reducer and one is timestamped with date 2010/01/01 and other with
date 2010/01/02 then the records are written to files in directories "/hdfs/root/reducer/2010/01/01"
and "/hdfs/root/reducer/2010/01/02" respectively.

MultipleTextOutputFormat was designed to support such use cases but its not ported to 0.20.1.
I was hoping if there is a workaround.

Thanks,
-Rakesh

Date: Mon, 5 Apr 2010 08:45:13 -0700
From: erez_katz@yahoo.com
Subject: Re: Partitioning Reducer Output
To: mapreduce-user@hadoop.apache.org

A partitioner can be used to control how keys are distributed across reducers (overriding
the default 
hash(key)%num_of_reducers behavior)

I think Rakesh is asking about having multiple "types" of output from a single map-reduce
application.

Each reducer has a tmp work directory on hdfs (pointed by jobconf by mapred.work.output.dir
or as env var "mapred_work_output_dir if it is a streaming app).
The content of that folder of a reducer that completed successfully is moved to the actual
output folder of the task.

A reducer can create other files on that folder and provided that there are no name collisions
between reducer (meaning if the reducer number is appended to the file name), then one can
have the output folder contain multiple types of outputs , something
 like

part-00000
part-00001
part-00002
otherType-00000
otherType-00001
otherType-00002

and later on these files can be moved around to other folders...

hope it helps,

  Erez Katz


--- On Mon, 4/5/10, David Rosenstrauch <darose@darose.net> wrote:

From: David Rosenstrauch <darose@darose.net>
Subject: Re: Partitioning Reducer Output
To: mapreduce-user@hadoop.apache.org
Date: Monday, April 5, 2010, 7:35 AM

On 04/02/2010 08:32 PM, rakesh kothari wrote:
>
> Hi,
>
> What's the best way to partition data generated from Reducer into multiple =
> directories in Hadoop 0.20.1. I was thinking of using MultipleTextOutputFor=
> mat but that's not backward compatible with other API's in this version of
 =
> hadoop.
>
> Thanks,
> -Rakesh                         

Use a partitioner?

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Job.html#setPartitionerClass%28java.lang.Class%29

HTH,

DR

 		 	   		  
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
Mime
View raw message