Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08A32109C8 for ; Fri, 5 Jul 2013 07:45:59 +0000 (UTC) Received: (qmail 25408 invoked by uid 500); 5 Jul 2013 07:45:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 24831 invoked by uid 500); 5 Jul 2013 07:45:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 24784 invoked by uid 99); 5 Jul 2013 07:45:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 07:45:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.124.192.215] (HELO smtp.2020smtp.net) (212.124.192.215) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 07:45:46 +0000 Received: from [86.11.111.188] (helo=[192.168.1.115]) by smtp.2020smtp.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.72) (envelope-from ) id 1Uv0ho-0006id-C2 for user@hadoop.apache.org; Fri, 05 Jul 2013 08:45:24 +0100 From: Robin East Content-Type: multipart/alternative; boundary="Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E" Message-Id: <7F434920-51A6-4EB6-8588-00D6685A6B41@xense.co.uk> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: How to update a file which is in HDFS Date: Fri, 5 Jul 2013 08:45:24 +0100 References: <869970D71E26D7498BDAC4E1CA92226B658D9F71@MBX021-E3-NJ-2.exch021.domain.local> To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-2020-Relay: Sent using 2020MEDIA.net.uk relay with auth code: xense Send Abuse reports to abuse@2020media.net.uk X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 The API for 1.1.2 FileSystem seems to include append(). Robin=20 On 5 Jul 2013, at 01:50, Mohammad Tariq wrote: > The current stable release doesn't support append, not even through = the API. If you really want this you have to switch to hadoop 2.x. > See this JIRA. >=20 > Warm Regards, > Tariq > cloudfront.blogspot.com >=20 >=20 > On Fri, Jul 5, 2013 at 3:05 AM, John Lilley = wrote: > Manickam, >=20 > =20 >=20 > HDFS supports append; it is the command-line client that does not.=20 >=20 > You can write a Java application that opens an HDFS-based file for = append, and use that instead of the hadoop command line. >=20 > However, this doesn=92t completely answer your original question: =93How= do I move only the delta part=94? This can be more complex than simply = doing an append. Have records in the original file changed in addition = to new records becoming available? If that is the case, you will need = to completely rewrite the file, as there is no overwriting of existing = file sections, even directly using HDFS. There are clever strategies = for working around this, like splitting the file into multiple parts on = HDFS so that the overwrite can proceed in parallel on the cluster; = however, that may be more work that you are looking for. Even if the = delta is limited to new records, the problem may not be trivial. How do = you know which records are new? Are all of the new records a the end of = the file? Or can they be anywhere in the file? If the latter, you will = need more complex logic. >=20 > =20 >=20 > John >=20 > =20 >=20 > =20 >=20 > From: Mohammad Tariq [mailto:dontariq@gmail.com]=20 > Sent: Thursday, July 04, 2013 5:47 AM > To: user@hadoop.apache.org > Subject: Re: How to update a file which is in HDFS >=20 > =20 >=20 > Hello Manickam, >=20 > =20 >=20 > Append is currently not possible. >=20 >=20 >=20 > Warm Regards, >=20 > Tariq >=20 > cloudfront.blogspot.com >=20 > =20 >=20 > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P = wrote: >=20 > Hi, >=20 > =20 >=20 > I have moved my input file into the HDFS location in the cluster = setup.=20 >=20 > Now i got a new set of file which has some new records along with the = old one.=20 >=20 > I want to move the delta part alone into HDFS because it will take = more time to move the file from my local to HDFS location.=20 >=20 > Is it possible or do i need to move the entire file into HDFS again?=20= >=20 > =20 >=20 > =20 >=20 > =20 >=20 > Thanks, > Manickam P >=20 > =20 >=20 >=20 --Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 The = API for 1.1.2 FileSystem seems to include = append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad = Tariq <dontariq@gmail.com>= wrote:

The current stable release doesn't = support append, not even through the API. If you really want this you = have to switch to hadoop 2.x.
See this JIRA.
=

Warm Regards,
Tariq


On Fri, Jul 5, 2013 at 3:05 AM, John = Lilley <john.lilley@redpoint.net> = wrote:

Manickam,

 

HDFS supports append; it is the command-line = client that does not. 

You can write a Java application that opens an = HDFS-based file for append, and use that instead of the hadoop command = line.

However, this doesn=92t completely answer your = original question: =93How do I move only the delta part=94?  This = can be more complex than simply doing an append.  Have records in the original file changed in addition to new records = becoming available?  If that is the case, you will need to = completely rewrite the file, as there is no overwriting of existing file = sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on = HDFS so that the overwrite can proceed in parallel on the cluster; = however, that may be more work that you are looking for.  Even if = the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are = all of the new records a the end of the file?  Or can they be = anywhere in the file?  If the latter, you will need more complex = logic.

 

John

 

 

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in = HDFS

 

Hello Manickam,

 

        Append is = currently not possible.


Warm Regards,

Tariq

 

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P = <manickam.p@outlook.com> = wrote:

Hi,

 

I have = moved my input file into the HDFS location in the cluster = setup. 

Now i = got a new set of file which has some new records along with the old = one. 

I want = to move the delta part alone into HDFS because it will take more time to = move the file from my local to HDFS = location. 

Is it = possible or do i need to move the entire file into HDFS = again? 

 

 

 

Thanks, Manickam P

 



= --Apple-Mail=_6AD25659-B6ED-4FDB-A56A-8A954F330D9E--