hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: Does Using MultipleTextOutputFormat Require the Deprecated API?
Date Tue, 08 Dec 2009 20:40:03 GMT
All,

This one has me stumped.

What I want to do is output from my reducer multiple files, one for each key
value. I also want to avoid any deprecated parts of the API.

As suggested, I switched from using MultipleTextOutputFormat to
MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
method requires a Reporter as a parameter, but as far as I can tell, the API
doesn't support this.  The only reporter I can find is in the context
object, but is declared protected.

Am I stuck? or just missing something?

My code:

@Override
public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException {
String fileName = key.toString();
             MultipleOutputs.addNamedOutput((JobConf)
context.getConfiguration(), fileName, OutputFormat.class, Text.class,
Text.class);
            mos = new MultipleOutputs((JobConf) context.getConfiguration());
            for (Text line : values) {

// This is the problem line:
                mos.getCollector(fileName, <reporter goes here>).collect(
                        key, line);
            }

            mos.close();
         }

On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aaron@cloudera.com> wrote:

> Geoffry,
>
> The new API comes with a related OF, called MultipleOutputs
> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
> using this instead.
>
> - Aaron
>
>
> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
>> All,
>>
>> What I want to do is output from my reducer multiple files one for each
>> key value.
>>
>> Can this still be done in the current API?
>>
>> It seems that using MultipleTextOutputFormat requires one to use
>> deprecated parts of API.
>>
>> It this correct?
>>
>> I would like to use the class or its equivalent and stay off anything
>> deprecated.
>>
>> Is there a work around?
>>
>> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
>> MultipleTextOutputFormat does not derive from this class.
>>
>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
>> OutputFormat>);
>>
>>
>> In the Old, deprecated API, one uses JobConf and an implementation of the
>> interface org.apache.hadoop.mapred.OutputFormat.
>> MultipleTextOutputFormat is just such an implementation.
>>
>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
>> OutputFormat);
>>
>
>

Mime
View raw message