hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: How do I create per-reducer temporary files?
Date Wed, 04 May 2011 16:03:45 GMT
Bryan,

I believe that map/reduce gives you a single drive to write to so that your reducer has less
of an impact on other reducers/mappers running on the same box.  If you want to write to more
drives I thought the idea would then be to increase the number of reducers you have and let
mapred assign each to a drive to use, instead of having one reducer eating up I/O bandwidth
from all of the drives.

--Bobby Evans

On 5/4/11 7:11 AM, "Bryan Keller" <bryanck@gmail.com> wrote:

I too am looking for the best place to put local temp files I create during reduce processing.
I am hoping there is a variable or property someplace that defines a per-reducer temp directory.
The "mapred.child.tmp" property is by default simply the relative directory "./tmp" so it
isn't useful on it's own.

I have 5 drives being used in "mapred.local.dir", and I was hoping to use them all for writing
temp files, rather than specifying a single temp directory that all my reducers use.


On Apr 9, 2011, at 2:40 AM, Harsh J wrote:

> Hello,
>
> On Tue, Apr 5, 2011 at 2:53 AM, W.P. McNeill <billmcn@gmail.com> wrote:
>> If I try:
>>
>>      storePath = FileOutputFormat.getPathForWorkFile(context, "my-file",
>> ".seq");
>>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
>>
>> I get an exception about a mismatch in file systems when trying to read from
>> the file.
>>
>> Alternately if I try:
>>
>>      storePath = new Path(SequenceFileOutputFormat.getUniqueFile(context,
>> "my-file", ".seq"));
>>      writer = SequenceFile.createWriter(FileSystem.get(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
>
> FileOutputFormat.getPathForWorkFile will give back HDFS paths. And
> since you are looking to create local temporary files to be used only
> by the task within itself, you shouldn't really worry about unique
> filenames (stuff can go wrong).
>
> You're looking for the tmp/ directory locally created in the FS where
> the Task is running (at ${mapred.child.tmp}, which defaults to ./tmp).
> You can create a regular file there using vanilla Java APIs for files,
> or using RawLocalFS + your own created Path (not derived via
> OutputFormat/etc.).
>
>>      storePath = new Path(new Path(context.getConf().get("mapred.child.tmp"), "my-file.seq");
>>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>>            configuration, storePath, IntWritable.class, itemClass);
>>      ...
>>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
>> storePath, configuration);
>
> The above should work, I think (haven't tried, but the idea is to use
> the mapred.child.tmp).
>
> Also see: http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Directory+Structure
>
> --
> Harsh J



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message