hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shirley Cohen <shirl...@cis.upenn.edu>
Subject Re: MultipleOutputFormat versus MultipleOutputs
Date Fri, 29 Aug 2008 15:13:49 GMT
Thanks, Benjamin. Your example saved me a lot of time :))

Shirley

On Aug 28, 2008, at 8:03 AM, Benjamin Gufler wrote:

> Hi Shirley,
>
> On 2008-08-28 14:32, Shirley Cohen wrote:
>> Do you have an example that shows how to use MultipleOutputFormat?
>
> using MultipleOutputFormat is actually pretty easy. Derive a class  
> from
> it, overriding - if you want to base the destination file name on the
> key and/or value - the method "generateFileNameForKeyValue". I'm using
> it this way:
>
> protected String generateFileNameForKeyValue(K key, V value,
>         String name) {
>     return name + "-" + key.toString();
> }
>
> Pay attention at not generating too many different file names,  
> however:
> All the files are kept open until the Reducer terminates, and  
> operating
> systems usually impose a limit on open files you can have.
>
> Also, if you haven't done so yet, please upgrade to the latest  
> release,
> 0.18, if you want to use MultipleOutputFormat. Up to 0.17.2, there was
> some trouble with Reducers having more than one output file (see
> HADOOP-3639 for the details).
>
> Benjamin


Mime
View raw message