hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "W.P. McNeill" <bill...@gmail.com>
Subject Re: How do I create per-reducer temporary files?
Date Mon, 04 Apr 2011 23:30:04 GMT
I recall that at one point I did some API call that created a writable file
beneath the _logs directory of my job.  I think this is exactly what I need,
but I can't remember what I did now, and I've having a hard time figuring it
out from the online API documentation.

On Mon, Apr 4, 2011 at 2:23 PM, W.P. McNeill <billmcn@gmail.com> wrote:

> I have a Hadoop reducer that needs to write then read (key, value) pairs
> from a local temporary file. It seems like the way to do this is with
> Sequence Files, letting the Hadoop API choose their names for me, but I
> haven't found the right combination of API calls to make it work.
>
> If I try:
>
>       storePath = FileOutputFormat.getPathForWorkFile(context, "my-file",
> ".seq");
>       writer =
> SequenceFile.createWriter(FileSystem.getLocal(configuration),
>             configuration, storePath, IntWritable.class, itemClass);
>       ...
>       reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);
>
> I get an exception about a mismatch in file systems when trying to read
> from the file.
>
> java.lang.IllegalArgumentException: Wrong FS: hdfs://
> chnode1.tuk2.com:40020/user/bmcneill/blocking/1.8/intelius-names/extremal-sets/_temporary/_attempt_201103311636_0393_r_000000_0/my-file-r-00000.seq,
> expected: file:///
>  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:352)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
>  at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:368)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>  at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:718)
> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1424)
>  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1419)
>
> Alternately if I try:
>
>       storePath = new Path(SequenceFileOutputFormat.getUniqueFile(context,
> "my-file", ".seq"));
>       writer = SequenceFile.createWriter(FileSystem.get(configuration),
>             configuration, storePath, IntWritable.class, itemClass);
>       ...
>       reader = new SequenceFile.Reader(FileSystem.getLocal(configuration),
> storePath, configuration);
>
> The file is missing when I try to read from it.
>
> java.io.FileNotFoundException: File my-file-r-00000.seq does not exist.
>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:372)
>
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:718)
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>
> Am I using the wrong APIs, or is my problem elsewhere?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message