hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sheridan <psheri...@millennialmedia.com>
Subject Re: Detect when file is not being written by another process
Date Tue, 25 Sep 2012 16:53:32 GMT
These are log files being deposited by other processes, which we may not have control over.

We don't want multiple processes to write to the same files — we just don't want to start
our jobs until they have been completely written.

Sorry for lack of clarity & thanks for the response.


--Pete

From: Bertrand Dechoux <dechouxb@gmail.com<mailto:dechouxb@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Tuesday, September 25, 2012 12:33 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Detect when file is not being written by another process

Hi,

Multiple files and aggregation or something like hbase?

Could you tell use more about your context? What are the volumes? Why do you want multiple
processes to write to the same file?

Regards

Bertrand

On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan <psheridan@millennialmedia.com<mailto:psheridan@millennialmedia.com>>
wrote:
Hi all.

We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB) files when they've finished
being written to HDFS by a different process.  There doesn't appear to be an API specifically
for this.  We had discovered through experimentation that the FileSystem.append() method can
be used for this purpose — it will fail if another process is writing to the file.

However: when running this on a multi-node cluster, using that API actually corrupts the file.
 Perhaps this is a known issue?  Looking at the bug tracker I see https://issues.apache.org/jira/browse/HDFS-265
and a bunch of similar-sounding things.

What's the right way to solve this problem?  Thanks.


--Pete




--
Bertrand Dechoux

Mime
View raw message