Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: Robin East <robin.east@xense.co.uk>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E"
Message-Id: <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: How to update a file which is in HDFS
Date: Fri, 5 Jul 2013 08:45:24 +0100
References: <BAY176-W2389BC1D3FFB625F46DF5A8D7C0@phx.gbl>
 <CAMVC6ROZH4MvLSQ44YxjHGsyN3JLqa-W_wS74C95qKVNZvi+pg@mail.gmail.com>
 <869970D71E26D7498BDAC4E1CA92226B658D9F71@MBX021-E3-NJ-2.exch021.domain.local>
 <CAMVC6RPgpW6XFcZxQNh0TH+UKmfupdhaEFR6Yk_B3gVnsa8nXw@mail.gmail.com>
To: user@hadoop.apache.org
In-Reply-To: 
 <CAMVC6RPgpW6XFcZxQNh0TH+UKmfupdhaEFR6Yk_B3gVnsa8nXw@mail.gmail.com>


--Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

The API for 1.1.2 FileSystem seems to include append().
Robin=20
On 5 Jul 2013, at 01:50, Mohammad Tariq <dontariq@gmail.com> wrote:

> The current stable release doesn't support append, not even through =
the API. If you really want this you have to switch to hadoop 2.x.
> See this JIRA.
>=20
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>=20
>=20
> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <john.lilley@redpoint.net> =
wrote:
> Manickam,
>=20
> =20
>=20
> HDFS supports append; it is the command-line client that does not.=20
>=20
> You can write a Java application that opens an HDFS-based file for =
append, and use that instead of the hadoop command line.
>=20
> However, this doesn=92t completely answer your original question: =93How=
 do I move only the delta part=94?  This can be more complex than simply =
doing an append.  Have records in the original file changed in addition =
to new records becoming available?  If that is the case, you will need =
to completely rewrite the file, as there is no overwriting of existing =
file sections, even directly using HDFS.  There are clever strategies =
for working around this, like splitting the file into multiple parts on =
HDFS so that the overwrite can proceed in parallel on the cluster; =
however, that may be more work that you are looking for.  Even if the =
delta is limited to new records, the problem may not be trivial.  How do =
you know which records are new?  Are all of the new records a the end of =
the file?  Or can they be anywhere in the file?  If the latter, you will =
need more complex logic.
>=20
> =20
>=20
> John
>=20
> =20
>=20
> =20
>=20
> From: Mohammad Tariq [mailto:dontariq@gmail.com]=20
> Sent: Thursday, July 04, 2013 5:47 AM
> To: user@hadoop.apache.org
> Subject: Re: How to update a file which is in HDFS
>=20
> =20
>=20
> Hello Manickam,
>=20
> =20
>=20
>         Append is currently not possible.
>=20
>=20
>=20
> Warm Regards,
>=20
> Tariq
>=20
> cloudfront.blogspot.com
>=20
> =20
>=20
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com> =
wrote:
>=20
> Hi,
>=20
> =20
>=20
> I have moved my input file into the HDFS location in the cluster =
setup.=20
>=20
> Now i got a new set of file which has some new records along with the =
old one.=20
>=20
> I want to move the delta part alone into HDFS because it will take =
more time to move the file from my local to HDFS location.=20
>=20
> Is it possible or do i need to move the entire file into HDFS again?=20=

>=20
> =20
>=20
> =20
>=20
> =20
>=20
> Thanks,
> Manickam P
>=20
> =20
>=20
>=20


--Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The =
API for 1.1.2&nbsp;<a =
href=3D"http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/File=
System.html">FileSystem</a>&nbsp;seems to include =
append().<div>Robin&nbsp;<br><div><div>On 5 Jul 2013, at 01:50, Mohammad =
Tariq &lt;<a href=3D"mailto:dontariq@gmail.com">dontariq@gmail.com</a>&gt;=
 wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">The current stable release doesn't =
support append, not even through the API. If you really want this you =
have to switch to hadoop 2.x.<div style=3D"">See this <a =
href=3D"https://issues.apache.org/jira/browse/HADOOP-8230">JIRA</a>.</div>=


</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div =
dir=3D"ltr">Warm Regards,<div>Tariq</div><div><a =
href=3D"http://cloudfront.blogspot.com/" =
target=3D"_blank">cloudfront.blogspot.com</a><br></div></div></div>
<br><br><div class=3D"gmail_quote">On Fri, Jul 5, 2013 at 3:05 AM, John =
Lilley <span dir=3D"ltr">&lt;<a href=3D"mailto:john.lilley@redpoint.net" =
target=3D"_blank">john.lilley@redpoint.net</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">Manickam,<u></u><u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">HDFS supports append; it is the command-line =
client that does not.&nbsp;
<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">You can write a Java application that opens an =
HDFS-based file for append, and use that instead of the hadoop command =
line.<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">However, this doesn=92t completely answer your =
original question: =93How do I move only the delta part=94?&nbsp; This =
can be more complex than simply doing an append.&nbsp;
 Have records in the original file changed in addition to new records =
becoming available?&nbsp; If that is the case, you will need to =
completely rewrite the file, as there is no overwriting of existing file =
sections, even directly using HDFS.&nbsp; There are clever strategies
 for working around this, like splitting the file into multiple parts on =
HDFS so that the overwrite can proceed in parallel on the cluster; =
however, that may be more work that you are looking for.&nbsp; Even if =
the delta is limited to new records, the problem may
 not be trivial.&nbsp; How do you know which records are new?&nbsp; Are =
all of the new records a the end of the file?&nbsp; Or can they be =
anywhere in the file?&nbsp; If the latter, you will need more complex =
logic.<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">John<u></u><u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><b><span =
style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&=
quot;">From:</span></b><span =
style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&=
quot;"> Mohammad Tariq [mailto:<a href=3D"mailto:dontariq@gmail.com" =
target=3D"_blank">dontariq@gmail.com</a>]
<br>
<b>Sent:</b> Thursday, July 04, 2013 5:47 AM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" =
target=3D"_blank">user@hadoop.apache.org</a><br>
<b>Subject:</b> Re: How to update a file which is in =
HDFS<u></u><u></u></span></p><div><div class=3D"h5"><p =
class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
<div><p class=3D"MsoNormal">Hello Manickam,<u></u><u></u></p>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; Append is =
currently not possible.<u></u><u></u></p>
</div>
</div>
<div><p class=3D"MsoNormal"><br clear=3D"all">
<u></u><u></u></p>
<div>
<div><p class=3D"MsoNormal">Warm Regards,<u></u><u></u></p>
<div><p class=3D"MsoNormal">Tariq<u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><a href=3D"http://cloudfront.blogspot.com/" =
target=3D"_blank">cloudfront.blogspot.com</a><u></u><u></u></p>
</div>
</div>
</div><p class=3D"MsoNormal" =
style=3D"margin-bottom:12.0pt"><u></u>&nbsp;<u></u></p>
<div><p class=3D"MsoNormal">On Thu, Jul 4, 2013 at 4:40 PM, Manickam P =
&lt;<a href=3D"mailto:manickam.p@outlook.com" =
target=3D"_blank">manickam.p@outlook.com</a>&gt; =
wrote:<u></u><u></u></p>
<div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Hi,</span=
><u></u><u></u></p>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">I have =
moved my input file into the HDFS location in the cluster =
setup.&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Now i =
got a new set of file which has some new records along with the old =
one.&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">I want =
to move the delta part alone into HDFS because it will take more time to =
move the file from my local to HDFS =
location.&nbsp;</span><u></u><u></u></p>


</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Is it =
possible or do i need to move the entire file into HDFS =
again?&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Thanks,<b=
r>
Manickam P</span><u></u><u></u></p>
</div>
</div>
</div>
</div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
</div></div></div>
</div>

</blockquote></div><br></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E--