hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Detect when file is not being written by another process
Date Tue, 25 Sep 2012 16:33:20 GMT

Multiple files and aggregation or something like hbase?

Could you tell use more about your context? What are the volumes? Why do
you want multiple processes to write to the same file?



On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Hi all.
>  We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB)
> files when they've finished being written to HDFS by a different process.
>  There doesn't appear to be an API specifically for this.  We had
> discovered through experimentation that the FileSystem.append() method can
> be used for this purpose — it will fail if another process is writing to
> the file.
>  However: when running this on a multi-node cluster, using that API
> actually corrupts the file.  Perhaps this is a known issue?  Looking at the
> bug tracker I see https://issues.apache.org/jira/browse/HDFS-265 and a
> bunch of similar-sounding things.
>  What's the right way to solve this problem?  Thanks.
>  --Pete

Bertrand Dechoux

View raw message