Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A726D7F0 for ; Tue, 25 Sep 2012 16:33:53 +0000 (UTC) Received: (qmail 22595 invoked by uid 500); 25 Sep 2012 16:33:48 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 22507 invoked by uid 500); 25 Sep 2012 16:33:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 22499 invoked by uid 99); 25 Sep 2012 16:33:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 16:33:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dechouxb@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 16:33:41 +0000 Received: by qcon41 with SMTP id n41so3639547qco.35 for ; Tue, 25 Sep 2012 09:33:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qXxWV1xfoXvyZ11H636PyO5HthRnXfZWCdTj0ieTpLA=; b=S/WkmhgdVeTAJZ1sXEny/lRfr6OaL0kPziaTEmsGjhL2LphoeTno7YxToHNophooga k4+ZjXfjBbyei6T7Hjl2eolNj3qW2TZkO8ajIyquTtJmIMCjvLC0Cx/lcH+gcR3bim5j w3rStTvckUUDisIXtshe2xA77vF8znIujTqkrEhXJrnSbd2VivcHjsYY4dVi+Qi9kx++ DbvqdjwXO7I9EX365M9RyXJFzX/vV5mrOErq4X9F2qBGe88seOer9a+/aLvRhiHA1RKU NXvOi/Hi07PjtF7mLkxyP8nLOHpD/nhXPL56dUkMZSY4Wh5XJS5jsnpzW/pvK+vQ7Ej1 69OA== MIME-Version: 1.0 Received: by 10.229.135.76 with SMTP id m12mr11594374qct.68.1348590800967; Tue, 25 Sep 2012 09:33:20 -0700 (PDT) Received: by 10.49.71.231 with HTTP; Tue, 25 Sep 2012 09:33:20 -0700 (PDT) In-Reply-To: <4A3B3466BCAEF24E80F8EB422B1EE0010F011593@MBX021-E3-NJ-6.exch021.domain.local> References: <4A3B3466BCAEF24E80F8EB422B1EE0010F011593@MBX021-E3-NJ-6.exch021.domain.local> Date: Tue, 25 Sep 2012 18:33:20 +0200 Message-ID: Subject: Re: Detect when file is not being written by another process From: Bertrand Dechoux To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00248c6a667ea3978d04ca894342 --00248c6a667ea3978d04ca894342 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, Multiple files and aggregation or something like hbase? Could you tell use more about your context? What are the volumes? Why do you want multiple processes to write to the same file? Regards Bertrand On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan < psheridan@millennialmedia.com> wrote: > Hi all. > > We're using Hadoop 1.0.3. We need to pick up a set of large (4+GB) > files when they've finished being written to HDFS by a different process. > There doesn't appear to be an API specifically for this. We had > discovered through experimentation that the FileSystem.append() method ca= n > be used for this purpose =97 it will fail if another process is writing t= o > the file. > > However: when running this on a multi-node cluster, using that API > actually corrupts the file. Perhaps this is a known issue? Looking at t= he > bug tracker I see https://issues.apache.org/jira/browse/HDFS-265 and a > bunch of similar-sounding things. > > What's the right way to solve this problem? Thanks. > > > --Pete > > --=20 Bertrand Dechoux --00248c6a667ea3978d04ca894342 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi,

Multiple files and aggregation or something like hbase?

C= ould you tell use more about your context? What are the volumes? Why do you= want multiple processes to write to the same file?

Regards

Bertrand

On Tue, Sep 25, 2012 at 6:28 PM,= Peter Sheridan <psheridan@millennialmedia.com> = wrote:
Hi all.

We're using Hadoop 1.0.3. =A0We need to pick up a set of large (4+= GB) files when they've finished being written to HDFS by a different pr= ocess. =A0There doesn't appear to be an API specifically for this. =A0W= e had discovered through experimentation that the FileSystem.append() method can be used for this purpose =97 it will fail i= f another process is writing to the file.

However: when running this on a multi-node cluster, using that API act= ually corrupts the file. =A0Perhaps this is a known issue? =A0Looking at th= e bug tracker I see=A0https://issues.apache.org/jira/browse/HDFS-265= =A0and a bunch of similar-sounding things.

What's the right way to solve this problem? =A0Thanks.


--Pete




--
Bertrand Dechoux
--00248c6a667ea3978d04ca894342--