hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: How do I create per-reducer temporary files?
Date Sat, 09 Apr 2011 09:40:09 GMT
Hello,

On Tue, Apr 5, 2011 at 2:53 AM, W.P. McNeill <billmcn@gmail.com> wrote:
> If I try:
>
>      storePath = FileOutputFormat.getPathForWorkFile(context, "my-file",
> ".seq");
>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);
>
> I get an exception about a mismatch in file systems when trying to read from
> the file.
>
> Alternately if I try:
>
>      storePath = new Path(SequenceFileOutputFormat.getUniqueFile(context,
> "my-file", ".seq"));
>      writer = SequenceFile.createWriter(FileSystem.get(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);

FileOutputFormat.getPathForWorkFile will give back HDFS paths. And
since you are looking to create local temporary files to be used only
by the task within itself, you shouldn't really worry about unique
filenames (stuff can go wrong).

You're looking for the tmp/ directory locally created in the FS where
the Task is running (at ${mapred.child.tmp}, which defaults to ./tmp).
You can create a regular file there using vanilla Java APIs for files,
or using RawLocalFS + your own created Path (not derived via
OutputFormat/etc.).

>      storePath = new Path(new Path(context.getConf().get("mapred.child.tmp"), "my-file.seq");
>      writer = SequenceFile.createWriter(FileSystem.getLocal(configuration),
>            configuration, storePath, IntWritable.class, itemClass);
>      ...
>      reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);

The above should work, I think (haven't tried, but the idea is to use
the mapred.child.tmp).

Also see: http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Directory+Structure

-- 
Harsh J

Mime
View raw message