hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: Does Using MultipleTextOutputFormat Require the Deprecated API?
Date Wed, 09 Dec 2009 15:24:26 GMT
Aaron,

I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
>
> lib.output.MultipleOutputs.  I'm using the download page where the tar ball
> is dated from Sep.09.



> Sounds like I need to look at the code repository.
>


On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aaron@cloudera.com> wrote:

> Geoffry,
>
> There are two MultipleOutputs implementations; one for the new API, one for
> the old one.
>
> The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
> not have a getCollector() method. This is intended to work with
> org.apache.hadoop.mapreduce.Mapper and its associated Context object.
>
> The old API implementation of MO
> (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
> org.apache.hadoop.mapred.Mapper, Reporter, and friends.
>
> If you're going to use the new org.apache.hadoop.mapreduce-based code, you
> should not need to import anything in the mapred package. That having been
> said -- I just realized that the new-API-compatible MultipleOutputs
> implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
> you're using 0.20, you should probably stick with the old API for your
> process.
>
> Cheers,
> - Aaron
>
>
> On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
>> All,
>>
>> This one has me stumped.
>>
>> What I want to do is output from my reducer multiple files, one for each
>> key value. I also want to avoid any deprecated parts of the API.
>>
>> As suggested, I switched from using MultipleTextOutputFormat to
>> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
>> method requires a Reporter as a parameter, but as far as I can tell, the API
>> doesn't support this.  The only reporter I can find is in the context
>> object, but is declared protected.
>>
>> Am I stuck? or just missing something?
>>
>> My code:
>>
>> @Override
>> public void reduce(Text key, Iterable<Text> values, Context context)
>>                 throws IOException {
>> String fileName = key.toString();
>>              MultipleOutputs.addNamedOutput((JobConf)
>> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
>> Text.class);
>>             mos = new MultipleOutputs((JobConf)
>> context.getConfiguration());
>>             for (Text line : values) {
>>
>> // This is the problem line:
>>                 mos.getCollector(fileName, <reporter goes here>).collect(
>>                         key, line);
>>             }
>>
>>             mos.close();
>>
>>          }
>>
>> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aaron@cloudera.com>wrote:
>>
>>> Geoffry,
>>>
>>> The new API comes with a related OF, called MultipleOutputs
>>> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
>>> using this instead.
>>>
>>> - Aaron
>>>
>>>
>>> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
>>> geoffry.roberts@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> What I want to do is output from my reducer multiple files one for each
>>>> key value.
>>>>
>>>> Can this still be done in the current API?
>>>>
>>>> It seems that using MultipleTextOutputFormat requires one to use
>>>> deprecated parts of API.
>>>>
>>>> It this correct?
>>>>
>>>> I would like to use the class or its equivalent and stay off anything
>>>> deprecated.
>>>>
>>>> Is there a work around?
>>>>
>>>> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
>>>> MultipleTextOutputFormat does not derive from this class.
>>>>
>>>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
>>>> OutputFormat>);
>>>>
>>>>
>>>> In the Old, deprecated API, one uses JobConf and an implementation of
>>>> the interface org.apache.hadoop.mapred.OutputFormat.
>>>> MultipleTextOutputFormat is just such an implementation.
>>>>
>>>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
>>>> OutputFormat);
>>>>
>>>
>>>
>>
>

Mime
View raw message