hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sheridan <psheri...@millennialmedia.com>
Subject Detect when file is not being written by another process
Date Tue, 25 Sep 2012 16:28:27 GMT
Hi all.

We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB) files when they've finished
being written to HDFS by a different process.  There doesn't appear to be an API specifically
for this.  We had discovered through experimentation that the FileSystem.append() method can
be used for this purpose — it will fail if another process is writing to the file.

However: when running this on a multi-node cluster, using that API actually corrupts the file.
 Perhaps this is a known issue?  Looking at the bug tracker I see https://issues.apache.org/jira/browse/HDFS-265
and a bunch of similar-sounding things.

What's the right way to solve this problem?  Thanks.


View raw message