Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dechouxb@gmail.com designates
 209.85.216.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <4A3B3466BCAEF24E80F8EB422B1EE0010F011593@MBX021-E3-NJ-6.exch021.domain.local>
References: 
 <4A3B3466BCAEF24E80F8EB422B1EE0010F011593@MBX021-E3-NJ-6.exch021.domain.local>
Date: Tue, 25 Sep 2012 18:33:20 +0200
Message-ID: 
 <CAO6W-2dTgUhxyPScHTyVMAUTrxUtVFOmkcf5rERrsMwZxcVS3Q@mail.gmail.com>
Subject: Re: Detect when file is not being written by another process
From: Bertrand Dechoux <dechouxb@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00248c6a667ea3978d04ca894342

--00248c6a667ea3978d04ca894342
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

Multiple files and aggregation or something like hbase?

Could you tell use more about your context? What are the volumes? Why do
you want multiple processes to write to the same file?

Regards

Bertrand

On Tue, Sep 25, 2012 at 6:28 PM, Peter Sheridan <
psheridan@millennialmedia.com> wrote:

>  Hi all.
>
>  We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB)
> files when they've finished being written to HDFS by a different process.
>  There doesn't appear to be an API specifically for this.  We had
> discovered through experimentation that the FileSystem.append() method ca=
n
> be used for this purpose =97 it will fail if another process is writing t=
o
> the file.
>
>  However: when running this on a multi-node cluster, using that API
> actually corrupts the file.  Perhaps this is a known issue?  Looking at t=
he
> bug tracker I see https://issues.apache.org/jira/browse/HDFS-265 and a
> bunch of similar-sounding things.
>
>  What's the right way to solve this problem?  Thanks.
>
>
>  --Pete
>
>


--=20
Bertrand Dechoux

--00248c6a667ea3978d04ca894342
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,<br><br>Multiple files and aggregation or something like hbase?<br><br>C=
ould you tell use more about your context? What are the volumes? Why do you=
 want multiple processes to write to the same file?<br><br>Regards<br><br>
Bertrand<br><br><div class=3D"gmail_quote">On Tue, Sep 25, 2012 at 6:28 PM,=
 Peter Sheridan <span dir=3D"ltr">&lt;<a href=3D"mailto:psheridan@millennia=
lmedia.com" target=3D"_blank">psheridan@millennialmedia.com</a>&gt;</span> =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div style=3D"font-size:14px;font-family:Calibri,sans-serif;word-wrap:break=
-word">
<div>Hi all.</div>
<div><br>
</div>
<div>We&#39;re using Hadoop 1.0.3. =A0We need to pick up a set of large (4+=
GB) files when they&#39;ve finished being written to HDFS by a different pr=
ocess. =A0There doesn&#39;t appear to be an API specifically for this. =A0W=
e had discovered through experimentation that the
 FileSystem.append() method can be used for this purpose =97 it will fail i=
f another process is writing to the file.</div>
<div><br>
</div>
<div>However: when running this on a multi-node cluster, using that API act=
ually corrupts the file. =A0Perhaps this is a known issue? =A0Looking at th=
e bug tracker I see=A0<a href=3D"https://issues.apache.org/jira/browse/HDFS=
-265" target=3D"_blank">https://issues.apache.org/jira/browse/HDFS-265</a>=
=A0and
 a bunch of similar-sounding things.</div>
<div><br>
</div>
<div>What&#39;s the right way to solve this problem? =A0Thanks.</div>
<div><br>
</div>
<div><br>
</div>
<div>--Pete</div>
<div><br>
</div>
</div>

</blockquote></div><br><br clear=3D"all"><br>-- <br>Bertrand Dechoux<br>

--00248c6a667ea3978d04ca894342--