hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Schmitz <christoph.schm...@1und1.de>
Subject Re: Out-of-band writing from mapper
Date Wed, 20 Apr 2011 09:30:25 GMT
Hello Harsh,

thanks for your help!

I've found that this works for me in my Mapper's setup() method:

FileSystem fs = FileSystem.get(context.getConfiguration());
// get the attempt directory
Path outputDir = FileOutputFormat.getWorkOutputPath(context);
Path outputPath = new Path(outputDir, "out-of-band-output-" + 

These files are dealt with in the way the FAQ describes (e.g. cleaned up 
for failed attempts, etc.). I will add a pointer to 
FileOutputFormat.getWorkOutputPath to the FAQ.

PS. As far as I understand, MultipleOutputs would be used in the 
reducer, right? (Which I wanted to avoid for the bulk of my data.)


On 04/20/2011 11:18 AM, Harsh J wrote:
> Hello Christoph,
> On Wed, Apr 20, 2011 at 2:12 PM, Christoph Schmitz
> <Christoph.Schmitz@1und1.de>  wrote:
>> My question is: is there any mechanism to assist me in writing to some designated
place in the HDFS from the mapper, in a way that is recognized by the framework (i.e. dealing
with aborted tasks, speculative execution etc.)?
>> I was thinking along the lines of what is described in the FAQ here:
>> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>> The FAQ explains that for reducers, there is support for special per-task output
directories that are recognized by the framework, but it seems (I tried it out) that this
is not supported for mappers.
> [Perhaps you can consider using the MultipleOutputs class to write
> output files from your job, instead of writing your own FS handling
> code.]
> The attempt directories are created for both Map and Reduce tasks. If
> the FAQ makes this ambiguous, it ought to be fixed :)

View raw message