Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D51D8109E6 for ; Fri, 5 Jul 2013 07:54:49 +0000 (UTC) Received: (qmail 32611 invoked by uid 500); 5 Jul 2013 07:54:43 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 32466 invoked by uid 500); 5 Jul 2013 07:54:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 32449 invoked by uid 99); 5 Jul 2013 07:54:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 07:54:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.124.192.215] (HELO smtp.2020smtp.net) (212.124.192.215) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 07:54:36 +0000 Received: from [86.11.111.188] (helo=[192.168.1.115]) by smtp.2020smtp.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.72) (envelope-from ) id 1Uv0qN-0006yy-HT for user@hadoop.apache.org; Fri, 05 Jul 2013 08:54:15 +0100 From: Robin East Content-Type: multipart/alternative; boundary="Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16" Message-Id: <87996590-BB2F-4442-940C-BC84AB31301D@xense.co.uk> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: How to update a file which is in HDFS Date: Fri, 5 Jul 2013 08:54:15 +0100 References: <869970D71E26D7498BDAC4E1CA92226B658D9F71@MBX021-E3-NJ-2.exch021.domain.local> <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk> To: user@hadoop.apache.org In-Reply-To: <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk> X-Mailer: Apple Mail (2.1508) X-2020-Relay: Sent using 2020MEDIA.net.uk relay with auth code: xense Send Abuse reports to abuse@2020media.net.uk X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Ok just read the JIRA in detail (pays to read these things before = posting). It says: Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need = append. If you enabled dfs.support.append for HBase, you're OK, as = durable sync (why HBase required dfs.support.append) is now enabled by = default. If you really need the previous functionality, to turn on the = append functionality set the flag "dfs.support.broken.append" to true. That says to me you can have append working if you set = dfs.support.broken.append to true. So append appears to be available in = 1.x but it is hardly recommended. Robi On 5 Jul 2013, at 08:45, Robin East wrote: > The API for 1.1.2 FileSystem seems to include append(). > Robin=20 > On 5 Jul 2013, at 01:50, Mohammad Tariq wrote: >=20 >> The current stable release doesn't support append, not even through = the API. If you really want this you have to switch to hadoop 2.x. >> See this JIRA. >>=20 >> Warm Regards, >> Tariq >> cloudfront.blogspot.com >>=20 >>=20 >> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley = wrote: >> Manickam, >>=20 >> =20 >>=20 >> HDFS supports append; it is the command-line client that does not.=20 >>=20 >> You can write a Java application that opens an HDFS-based file for = append, and use that instead of the hadoop command line. >>=20 >> However, this doesn=92t completely answer your original question: = =93How do I move only the delta part=94? This can be more complex than = simply doing an append. Have records in the original file changed in = addition to new records becoming available? If that is the case, you = will need to completely rewrite the file, as there is no overwriting of = existing file sections, even directly using HDFS. There are clever = strategies for working around this, like splitting the file into = multiple parts on HDFS so that the overwrite can proceed in parallel on = the cluster; however, that may be more work that you are looking for. = Even if the delta is limited to new records, the problem may not be = trivial. How do you know which records are new? Are all of the new = records a the end of the file? Or can they be anywhere in the file? If = the latter, you will need more complex logic. >>=20 >> =20 >>=20 >> John >>=20 >> =20 >>=20 >> =20 >>=20 >> From: Mohammad Tariq [mailto:dontariq@gmail.com]=20 >> Sent: Thursday, July 04, 2013 5:47 AM >> To: user@hadoop.apache.org >> Subject: Re: How to update a file which is in HDFS >>=20 >> =20 >>=20 >> Hello Manickam, >>=20 >> =20 >>=20 >> Append is currently not possible. >>=20 >>=20 >>=20 >> Warm Regards, >>=20 >> Tariq >>=20 >> cloudfront.blogspot.com >>=20 >> =20 >>=20 >> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P = wrote: >>=20 >> Hi, >>=20 >> =20 >>=20 >> I have moved my input file into the HDFS location in the cluster = setup.=20 >>=20 >> Now i got a new set of file which has some new records along with the = old one.=20 >>=20 >> I want to move the delta part alone into HDFS because it will take = more time to move the file from my local to HDFS location.=20 >>=20 >> Is it possible or do i need to move the entire file into HDFS again?=20= >>=20 >> =20 >>=20 >> =20 >>=20 >> =20 >>=20 >> Thanks, >> Manickam P >>=20 >> =20 >>=20 >>=20 >=20 --Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Ok = just read the JIRA in detail (pays to read these things before posting). = It says:

Append is not supported in = Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled = dfs.support.append for HBase, you're OK, as durable sync (why HBase = required dfs.support.append) is now enabled by default. If you really = need the previous functionality, to turn on the append functionality set = the flag "dfs.support.broken.append" to true.

That says to me you can have append working if = you set dfs.support.broken.append to true. So append appears to be = available in 1.x but it is hardly = recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East = <robin.east@xense.co.uk> = wrote:

The = API for 1.1.2 FileSystem seems to include = append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad = Tariq <dontariq@gmail.com>= wrote:

The current stable release doesn't = support append, not even through the API. If you really want this you = have to switch to hadoop 2.x.
See this JIRA.
=

Warm Regards,
Tariq


On Fri, Jul 5, 2013 at 3:05 AM, John = Lilley <john.lilley@redpoint.net> = wrote:

Manickam,

 

HDFS supports append; it is the command-line = client that does not. 

You can write a Java application that opens an = HDFS-based file for append, and use that instead of the hadoop command = line.

However, this doesn=92t completely answer your = original question: =93How do I move only the delta part=94?  This = can be more complex than simply doing an append.  Have records in the original file changed in addition to new records = becoming available?  If that is the case, you will need to = completely rewrite the file, as there is no overwriting of existing file = sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on = HDFS so that the overwrite can proceed in parallel on the cluster; = however, that may be more work that you are looking for.  Even if = the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are = all of the new records a the end of the file?  Or can they be = anywhere in the file?  If the latter, you will need more complex = logic.

 

John

 

 

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in = HDFS

 

Hello Manickam,

 

        Append is = currently not possible.


Warm Regards,

Tariq

 

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P = <manickam.p@outlook.com> = wrote:

Hi,

 

I have = moved my input file into the HDFS location in the cluster = setup. 

Now i = got a new set of file which has some new records along with the old = one. 

I want = to move the delta part alone into HDFS because it will take more time to = move the file from my local to HDFS = location. 

Is it = possible or do i need to move the entire file into HDFS = again? 

 

 

 

Thanks, Manickam P

 


=


= --Apple-Mail=_6AA2DE82-41E2-4976-8F5E-FD7739925A16--