hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi" <knogu...@yahoo-inc.com>
Subject RE: Multiple outputs and getmerge?
Date Tue, 21 Apr 2009 20:53:31 GMT
Something in the lines of 

... class MyOutputFormat extends MultipleTextOutputFormat<Text, Text> {
    protected String generateFileNameForKeyValue(Text key, 
                                                 Text v, String name) {
      Path outpath = new Path(key.toString(), name);
      return outpath.toString();
    }
  }

would create a directory per key.

If you just want to keep your side-effect files separate, then 
get your working dir by 
FileOutputFormat.getWorkOutputPath(...) 
or $mapred_work_output_dir

and dfs -mkdir <workdir>/NewDir and put the secondary files there.

Explained in 

http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)


Koji


-----Original Message-----
From: Stuart White [mailto:stuart.white1@gmail.com] 
Sent: Tuesday, April 21, 2009 11:46 AM
To: core-user@hadoop.apache.org
Subject: Re: Multiple outputs and getmerge?

On Tue, Apr 21, 2009 at 1:00 PM, Koji Noguchi <knoguchi@yahoo-inc.com> wrote:
>
> I once used MultipleOutputFormat and created
>   (mapred.work.output.dir)/type1/part-_____
>   (mapred.work.output.dir)/type2/part-_____
>    ...
>
> And JobTracker took care of the renaming to
>   (mapred.output.dir)/type{1,2}/part-______
>
> Would that work for you?

Can you please explain this in more detail?  It looks like you're
using MultipleOutputFormat for *both* of your outputs?  So, you simply
don't use the OutputCollector passed as a parm to Mapper#map()?

Mime
View raw message