Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: Robin East <robin.east@xense.co.uk>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16"
Message-Id: <87996590-BB2F-4442-940C-BC84AB31301D@xense.co.uk>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: How to update a file which is in HDFS
Date: Fri, 5 Jul 2013 08:54:15 +0100
References: <BAY176-W2389BC1D3FFB625F46DF5A8D7C0@phx.gbl>
 <CAMVC6ROZH4MvLSQ44YxjHGsyN3JLqa-W_wS74C95qKVNZvi+pg@mail.gmail.com>
 <869970D71E26D7498BDAC4E1CA92226B658D9F71@MBX021-E3-NJ-2.exch021.domain.local>
 <CAMVC6RPgpW6XFcZxQNh0TH+UKmfupdhaEFR6Yk_B3gVnsa8nXw@mail.gmail.com>
 <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk>
To: user@hadoop.apache.org
In-Reply-To: <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk>


--Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Ok just read the JIRA in detail (pays to read these things before =
posting). It says:

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need =
append. If you enabled dfs.support.append for HBase, you're OK, as =
durable sync (why HBase required dfs.support.append) is now enabled by =
default. If you really need the previous functionality, to turn on the =
append functionality set the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set =
dfs.support.broken.append to true. So append appears to be available in =
1.x but it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <robin.east@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin=20
> On 5 Jul 2013, at 01:50, Mohammad Tariq <dontariq@gmail.com> wrote:
>=20
>> The current stable release doesn't support append, not even through =
the API. If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>>=20
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>=20
>>=20
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley =
<john.lilley@redpoint.net> wrote:
>> Manickam,
>>=20
>> =20
>>=20
>> HDFS supports append; it is the command-line client that does not.=20
>>=20
>> You can write a Java application that opens an HDFS-based file for =
append, and use that instead of the hadoop command line.
>>=20
>> However, this doesn=92t completely answer your original question: =
=93How do I move only the delta part=94?  This can be more complex than =
simply doing an append.  Have records in the original file changed in =
addition to new records becoming available?  If that is the case, you =
will need to completely rewrite the file, as there is no overwriting of =
existing file sections, even directly using HDFS.  There are clever =
strategies for working around this, like splitting the file into =
multiple parts on HDFS so that the overwrite can proceed in parallel on =
the cluster; however, that may be more work that you are looking for.  =
Even if the delta is limited to new records, the problem may not be =
trivial.  How do you know which records are new?  Are all of the new =
records a the end of the file?  Or can they be anywhere in the file?  If =
the latter, you will need more complex logic.
>>=20
>> =20
>>=20
>> John
>>=20
>> =20
>>=20
>> =20
>>=20
>> From: Mohammad Tariq [mailto:dontariq@gmail.com]=20
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>>=20
>> =20
>>=20
>> Hello Manickam,
>>=20
>> =20
>>=20
>>         Append is currently not possible.
>>=20
>>=20
>>=20
>> Warm Regards,
>>=20
>> Tariq
>>=20
>> cloudfront.blogspot.com
>>=20
>> =20
>>=20
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com> =
wrote:
>>=20
>> Hi,
>>=20
>> =20
>>=20
>> I have moved my input file into the HDFS location in the cluster =
setup.=20
>>=20
>> Now i got a new set of file which has some new records along with the =
old one.=20
>>=20
>> I want to move the delta part alone into HDFS because it will take =
more time to move the file from my local to HDFS location.=20
>>=20
>> Is it possible or do i need to move the entire file into HDFS again?=20=

>>=20
>> =20
>>=20
>> =20
>>=20
>> =20
>>=20
>> Thanks,
>> Manickam P
>>=20
>> =20
>>=20
>>=20
>=20


--Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Ok =
just read the JIRA in detail (pays to read these things before posting). =
It says:<div><br></div><div><span style=3D"font-family: Arial, FreeSans, =
Helvetica, sans-serif; font-size: 13px; line-height: 17px; =
background-color: rgb(255, 255, 255); ">Append is not supported in =
Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled =
dfs.support.append for HBase, you're OK, as durable sync (why HBase =
required dfs.support.append) is now enabled by default. If you really =
need the previous functionality, to turn on the append functionality set =
the flag "dfs.support.broken.append" to true.</span></div><div><font =
face=3D"Arial, FreeSans, Helvetica, sans-serif"><span style=3D"font-size: =
13px; line-height: 17px;"><br></span></font></div><div><font =
face=3D"Arial, FreeSans, Helvetica, sans-serif"><span style=3D"font-size: =
13px; line-height: 17px;">That says to me you can have append working if =
you set dfs.support.broken.append to true. So append appears to be =
available in 1.x but it is hardly =
recommended.</span></font></div><div><font face=3D"Arial, FreeSans, =
Helvetica, sans-serif"><span style=3D"font-size: 13px; line-height: =
17px;"><br></span></font></div><div><font face=3D"Arial, FreeSans, =
Helvetica, sans-serif"><span style=3D"font-size: 13px; line-height: =
17px;">Robi</span></font></div><div><font face=3D"Arial, FreeSans, =
Helvetica, sans-serif"><span style=3D"font-size: 13px; line-height: =
17px;"><br></span></font></div><div><font face=3D"Arial, FreeSans, =
Helvetica, sans-serif"><span style=3D"font-size: 13px; line-height: =
17px;"><br></span></font><div><div>On 5 Jul 2013, at 08:45, Robin East =
&lt;<a =
href=3D"mailto:robin.east@xense.co.uk">robin.east@xense.co.uk</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The =
API for 1.1.2&nbsp;<a =
href=3D"http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/File=
System.html">FileSystem</a>&nbsp;seems to include =
append().<div>Robin&nbsp;<br><div><div>On 5 Jul 2013, at 01:50, Mohammad =
Tariq &lt;<a href=3D"mailto:dontariq@gmail.com">dontariq@gmail.com</a>&gt;=
 wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">The current stable release doesn't =
support append, not even through the API. If you really want this you =
have to switch to hadoop 2.x.<div style=3D"">See this <a =
href=3D"https://issues.apache.org/jira/browse/HADOOP-8230">JIRA</a>.</div>=


</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div =
dir=3D"ltr">Warm Regards,<div>Tariq</div><div><a =
href=3D"http://cloudfront.blogspot.com/" =
target=3D"_blank">cloudfront.blogspot.com</a><br></div></div></div>
<br><br><div class=3D"gmail_quote">On Fri, Jul 5, 2013 at 3:05 AM, John =
Lilley <span dir=3D"ltr">&lt;<a href=3D"mailto:john.lilley@redpoint.net" =
target=3D"_blank">john.lilley@redpoint.net</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">Manickam,<u></u><u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">HDFS supports append; it is the command-line =
client that does not.&nbsp;
<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">You can write a Java application that opens an =
HDFS-based file for append, and use that instead of the hadoop command =
line.<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">However, this doesn=92t completely answer your =
original question: =93How do I move only the delta part=94?&nbsp; This =
can be more complex than simply doing an append.&nbsp;
 Have records in the original file changed in addition to new records =
becoming available?&nbsp; If that is the case, you will need to =
completely rewrite the file, as there is no overwriting of existing file =
sections, even directly using HDFS.&nbsp; There are clever strategies
 for working around this, like splitting the file into multiple parts on =
HDFS so that the overwrite can proceed in parallel on the cluster; =
however, that may be more work that you are looking for.&nbsp; Even if =
the delta is limited to new records, the problem may
 not be trivial.&nbsp; How do you know which records are new?&nbsp; Are =
all of the new records a the end of the file?&nbsp; Or can they be =
anywhere in the file?&nbsp; If the latter, you will need more complex =
logic.<u></u><u></u></span></p><p class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d">John<u></u><u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><span =
style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif=
&quot;;color:#1f497d"><u></u>&nbsp;<u></u></span></p><p =
class=3D"MsoNormal"><b><span =
style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&=
quot;">From:</span></b><span =
style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&=
quot;"> Mohammad Tariq [mailto:<a href=3D"mailto:dontariq@gmail.com" =
target=3D"_blank">dontariq@gmail.com</a>]
<br>
<b>Sent:</b> Thursday, July 04, 2013 5:47 AM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" =
target=3D"_blank">user@hadoop.apache.org</a><br>
<b>Subject:</b> Re: How to update a file which is in =
HDFS<u></u><u></u></span></p><div><div class=3D"h5"><p =
class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
<div><p class=3D"MsoNormal">Hello Manickam,<u></u><u></u></p>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal">&nbsp; &nbsp; &nbsp; &nbsp; Append is =
currently not possible.<u></u><u></u></p>
</div>
</div>
<div><p class=3D"MsoNormal"><br clear=3D"all">
<u></u><u></u></p>
<div>
<div><p class=3D"MsoNormal">Warm Regards,<u></u><u></u></p>
<div><p class=3D"MsoNormal">Tariq<u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><a href=3D"http://cloudfront.blogspot.com/" =
target=3D"_blank">cloudfront.blogspot.com</a><u></u><u></u></p>
</div>
</div>
</div><p class=3D"MsoNormal" =
style=3D"margin-bottom:12.0pt"><u></u>&nbsp;<u></u></p>
<div><p class=3D"MsoNormal">On Thu, Jul 4, 2013 at 4:40 PM, Manickam P =
&lt;<a href=3D"mailto:manickam.p@outlook.com" =
target=3D"_blank">manickam.p@outlook.com</a>&gt; =
wrote:<u></u><u></u></p>
<div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Hi,</span=
><u></u><u></u></p>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">I have =
moved my input file into the HDFS location in the cluster =
setup.&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Now i =
got a new set of file which has some new records along with the old =
one.&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">I want =
to move the delta part alone into HDFS because it will take more time to =
move the file from my local to HDFS =
location.&nbsp;</span><u></u><u></u></p>


</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Is it =
possible or do i need to move the entire file into HDFS =
again?&nbsp;</span><u></u><u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
<div><p class=3D"MsoNormal"><span =
style=3D"font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;">Thanks,<b=
r>
Manickam P</span><u></u><u></u></p>
</div>
</div>
</div>
</div><p class=3D"MsoNormal"><u></u>&nbsp;<u></u></p>
</div>
</div></div></div>
</div>

</blockquote></div><br></div>
=
</blockquote></div><br></div></div></blockquote></div><br></div></body></h=
tml>=

--Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16--