hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meghana <meghana.mara...@germinait.com>
Subject Re: Reader/Writer problem in HDFS
Date Thu, 28 Jul 2011 11:01:59 GMT
Thanks Laxman! That would definitely help things. :)

Is there a better FileSystem/other method call to create a file in one go
(i.e. atomic i guess?), without having to call create() and then write to
the stream?

..meghana


On 28 July 2011 16:12, Laxman <lakshman_ch@huawei.com> wrote:

> One approach can be use some ".tmp" extension while writing. Once the write
> is completed rename back to original file name. Also, reducer has to filter
> out ".tmp" files.
>
> This will ensure reducer will not pickup the partial files.
>
> We do have the similar scenario where the a/m approach resolved the issue.
>
> -----Original Message-----
> From: Meghana [mailto:meghana.marathe@germinait.com]
> Sent: Thursday, July 28, 2011 1:38 PM
> To: common-user; hdfs-user@hadoop.apache.org
> Subject: Reader/Writer problem in HDFS
>
> Hi,
>
> We have a job where the map tasks are given the path to an output folder.
> Each map task writes a single file to that folder. There is no reduce
> phase.
> There is another thread, which constantly looks for new files in the output
> folder. If found, it persists the contents to index, and deletes the file.
>
> We use this code in the map task:
> try {
>    OutputStream oStream = fileSystem.create(path);
>    IOUtils.write("xyz", oStream);
> } finally {
>    IOUtils.closeQuietly(oStream);
> }
>
> The problem: Some times the reader thread sees & tries to read a file which
> is not yet fully written to HDFS (or the checksum is not written yet, etc),
> and throws an error. Is it possible to write an HDFS file in such a way
> that
> it won't be visible until it is fully written?
>
> We use Hadoop 0.20.203.
>
> Thanks,
>
> Meghana
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message