hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meghana <meghana.mara...@germinait.com>
Subject Reader/Writer problem in HDFS
Date Thu, 28 Jul 2011 08:08:17 GMT
Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
    OutputStream oStream = fileSystem.create(path);
    IOUtils.write("xyz", oStream);
} finally {
    IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees & tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message