hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: How to update a file which is in HDFS
Date Fri, 05 Jul 2013 00:50:17 GMT
The current stable release doesn't support append, not even through the
API. If you really want this you have to switch to hadoop 2.x.
See this JIRA <https://issues.apache.org/jira/browse/HADOOP-8230>.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <john.lilley@redpoint.net>wrote:

>  Manickam,****
>
> ** **
>
> HDFS supports append; it is the command-line client that does not.  ****
>
> You can write a Java application that opens an HDFS-based file for append,
> and use that instead of the hadoop command line.****
>
> However, this doesn’t completely answer your original question: “How do I
> move only the delta part”?  This can be more complex than simply doing an
> append.  Have records in the original file changed in addition to new
> records becoming available?  If that is the case, you will need to
> completely rewrite the file, as there is no overwriting of existing file
> sections, even directly using HDFS.  There are clever strategies for
> working around this, like splitting the file into multiple parts on HDFS so
> that the overwrite can proceed in parallel on the cluster; however, that
> may be more work that you are looking for.  Even if the delta is limited to
> new records, the problem may not be trivial.  How do you know which records
> are new?  Are all of the new records a the end of the file?  Or can they be
> anywhere in the file?  If the latter, you will need more complex logic.***
> *
>
> ** **
>
> John****
>
> ** **
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Thursday, July 04, 2013 5:47 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to update a file which is in HDFS****
>
> ** **
>
> Hello Manickam,****
>
> ** **
>
>         Append is currently not possible.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com> wrote:
> ****
>
> Hi,****
>
> ** **
>
> I have moved my input file into the HDFS location in the cluster setup. **
> **
>
> Now i got a new set of file which has some new records along with the old
> one. ****
>
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location. ****
>
> Is it possible or do i need to move the entire file into HDFS again? ****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
> Manickam P****
>
> ** **
>

Mime
View raw message