hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: Partitioning Reducer Output
Date Tue, 06 Apr 2010 02:01:57 GMT
On Mon, Apr 5, 2010 at 4:04 PM, rakesh kothari <rkothari_iit@hotmail.com>wrote:

>  Thanks for the insights.
>
> My use case is more around sending the reducer output to subdirectories
> representing date partitions.
>
> For example if the base reducer output directory is /hdfs/root/reducer/ and
> if there are two records encountered by reducer and one is timestamped with
> date 2010/01/01 and other with date 2010/01/02 then the records are written
> to files in directories "/hdfs/root/reducer/2010/01/01" and
> "/hdfs/root/reducer/2010/01/02" respectively.
>
> MultipleTextOutputFormat was designed to support such use cases but its not
> ported to 0.20.1. I was hoping if there is a workaround.
>

Unfortunately, your only options right now are:

1. Write to HDFS directly from your reducers being careful not to clobber
each other's output.
2. Revert back to the "old" APIs and use MTOF or MO as you've mentioned.

I believe CDH3 has (or will have) updated versions of MTOF and MO for the
new APIs but don't quote me on that.
-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Mime
View raw message