hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Using MultipleTextOutputFormat for map-only jobs
Date Thu, 14 Apr 2011 07:52:38 GMT
Hello Hari,

On Thu, Apr 14, 2011 at 11:09 AM, Hari Sreekumar
<hsreekumar@clickable.com> wrote:
> Hi,
> I have a map-only mapreduce job where I want to deduce the output filename
> from the output key/value. I figured MultipleTextOutputFormat is the best
> fit for my purpose. But I am unable to use it in map-only jobs. I was able
> to run it if I add a reduce phase. But when I use map-only jobs, the file
> gets written to the usual part-0000xx files. Also, is there no support for
> this output format in v0.20.2? I mean, is it necessary to use the deprecated
> classes if I want to use this?
> Thanks,
> Hari

The class MultipleOutputFormat is not available in the Hadoop for the
new, unstable API, as it has been replaced in functionality by the
MultipleOutputs class that does the same very similarly. However, the
new API MultipleOutputs is not part of the Apache's Hadoop 0.20.2
release either [1].

Using the stable API is still recommended (it is no longer marked
deprecated in 0.20.3 and 0.21 also supports the old API)

That said, it should still work for Map-only jobs as described in two
of its usecases [2]. Could you give us some details of your code setup
for using this?

[1] - It is available as part of 0.21.0, though, or in Cloudera's
Distribution including Apache Hadoop 0.20.2.
[2] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html

Harsh J

View raw message