Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2CB2D8A4 for ; Tue, 25 Sep 2012 16:54:06 +0000 (UTC) Received: (qmail 95363 invoked by uid 500); 25 Sep 2012 16:54:02 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 95242 invoked by uid 500); 25 Sep 2012 16:54:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95235 invoked by uid 99); 25 Sep 2012 16:54:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 16:54:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of psheridan@millennialmedia.com designates 206.225.164.219 as permitted sender) Received: from [206.225.164.219] (HELO hub021-nj-4.exch021.serverdata.net) (206.225.164.219) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 16:53:53 +0000 Received: from MBX021-E3-NJ-6.exch021.domain.local ([10.240.4.82]) by HUB021-NJ-4.exch021.domain.local ([10.240.4.39]) with mapi id 14.02.0309.002; Tue, 25 Sep 2012 09:53:32 -0700 From: Peter Sheridan To: "user@hadoop.apache.org" Subject: Re: Detect when file is not being written by another process Thread-Topic: Detect when file is not being written by another process Thread-Index: AQHNmzrJVSF86hg/yEW5YFEmhaCdDZebtdUA///ClAA= Date: Tue, 25 Sep 2012 16:53:32 +0000 Message-ID: <4A3B3466BCAEF24E80F8EB422B1EE0010F011670@MBX021-E3-NJ-6.exch021.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [38.118.54.10] Content-Type: multipart/alternative; boundary="_000_4A3B3466BCAEF24E80F8EB422B1EE0010F011670MBX021E3NJ6exch_" MIME-Version: 1.0 --_000_4A3B3466BCAEF24E80F8EB422B1EE0010F011670MBX021E3NJ6exch_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable These are log files being deposited by other processes, which we may not ha= ve control over. We don't want multiple processes to write to the same files =97 we just don= 't want to start our jobs until they have been completely written. Sorry for lack of clarity & thanks for the response. --Pete From: Bertrand Dechoux > Reply-To: "user@hadoop.apache.org" > Date: Tuesday, September 25, 2012 12:33 PM To: "user@hadoop.apache.org" > Subject: Re: Detect when file is not being written by another process Hi, Multiple files and aggregation or something like hbase? Could you tell use more about your context? What are the volumes? Why do yo= u want multiple processes to write to the same file? Regards Bertrand On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan > wrote: Hi all. We're using Hadoop 1.0.3. We need to pick up a set of large (4+GB) files w= hen they've finished being written to HDFS by a different process. There d= oesn't appear to be an API specifically for this. We had discovered throug= h experimentation that the FileSystem.append() method can be used for this = purpose =97 it will fail if another process is writing to the file. However: when running this on a multi-node cluster, using that API actually= corrupts the file. Perhaps this is a known issue? Looking at the bug tra= cker I see https://issues.apache.org/jira/browse/HDFS-265 and a bunch of si= milar-sounding things. What's the right way to solve this problem? Thanks. --Pete -- Bertrand Dechoux --_000_4A3B3466BCAEF24E80F8EB422B1EE0010F011670MBX021E3NJ6exch_ Content-Type: text/html; charset="Windows-1252" Content-ID: Content-Transfer-Encoding: quoted-printable
These are log files being deposited by other processes, which we may n= ot have control over.

We don't want multiple processes to write to the same files =97 we jus= t don't want to start our jobs until they have been completely written.

Sorry for lack of clarity & thanks for the response.


--Pete

From: Bertrand Dechoux <dechouxb@gmail.com>
Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Date: Tuesday, September 25, 2012 1= 2:33 PM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: Re: Detect when file is no= t being written by another process

Hi,

Multiple files and aggregation or something like hbase?

Could you tell use more about your context? What are the volumes? Why do yo= u want multiple processes to write to the same file?

Regards

Bertrand

On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan = <pshe= ridan@millennialmedia.com> wrote:
Hi all.

We're using Hadoop 1.0.3.  We need to pick up a set of large (4&#= 43;GB) files when they've finished being written to HDFS by a different pro= cess.  There doesn't appear to be an API specifically for this.  = We had discovered through experimentation that the FileSystem.append() method can be used for this purpose =97 it will fail i= f another process is writing to the file.

However: when running this on a multi-node cluster, using that API act= ually corrupts the file.  Perhaps this is a known issue?  Looking= at the bug tracker I see https://issues.apache.org/jira/browse/HDFS= -265 and a bunch of similar-sounding things.

What's the right way to solve this problem?  Thanks.


--Pete




--
Bertrand Dechoux
--_000_4A3B3466BCAEF24E80F8EB422B1EE0010F011670MBX021E3NJ6exch_--