hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Keller <brya...@gmail.com>
Subject Re: How do I create per-reducer temporary files?
Date Wed, 04 May 2011 12:11:08 GMT
I too am looking for the best place to put local temp files I create during reduce processing.
I am hoping there is a variable or property someplace that defines a per-reducer temp directory.
The "mapred.child.tmp" property is by default simply the relative directory "./tmp" so it
isn't useful on it's own.

I have 5 drives being used in "mapred.local.dir", and I was hoping to use them all for writing
temp files, rather than specifying a single temp directory that all my reducers use.


On Apr 9, 2011, at 2:40 AM, Harsh J wrote:

> Hello,
> 
> On Tue, Apr 5, 2011 at 2:53 AM, W.P. McNeill <billmcn@gmail.com> wrote:
>> If I try:
>> 
>>      storePath = FileOutputFormat.getPathForWorkFile(context, "my-file",
>> ".seq");
>>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
>> 
>> I get an exception about a mismatch in file systems when trying to read from
>> the file.
>> 
>> Alternately if I try:
>> 
>>      storePath = new Path(SequenceFileOutputFormat.getUniqueFile(context,
>> "my-file", ".seq"));
>>      writer = SequenceFile.createWriter(FileSystem.get(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
> 
> FileOutputFormat.getPathForWorkFile will give back HDFS paths. And
> since you are looking to create local temporary files to be used only
> by the task within itself, you shouldn't really worry about unique
> filenames (stuff can go wrong).
> 
> You're looking for the tmp/ directory locally created in the FS where
> the Task is running (at ${mapred.child.tmp}, which defaults to ./tmp).
> You can create a regular file there using vanilla Java APIs for files,
> or using RawLocalFS + your own created Path (not derived via
> OutputFormat/etc.).
> 
>>      storePath = new Path(new Path(context.getConf().get("mapred.child.tmp"), "my-file.seq");
>>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
> 
> The above should work, I think (haven't tried, but the idea is to use
> the mapred.child.tmp).
> 
> Also see: http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Directory+Structure
> 
> -- 
> Harsh J


Mime
View raw message