hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Append data in hdfs_write
Date Thu, 27 Mar 2008 17:35:06 GMT


 The present work-arounds for this are pretty complicated.

option1) you can write small files relatively frequently and every time you
write some number of them, you can concatenate them and delete them.  These
concatenations can receive the same treatment.  If managed carefully in
conjunction with a safe status update mechanism like zookeeper, you can have
a pretty robust system that reflects new data with fairly low latency (on
the order of seconds behind).

option2) you can accumulate data in a non-HDFS location until it is big
enough to push to HDFS.  This can be done in conjunction with option1.  The
danger is that you run the risk of losing data if the accumulator fails
before burping data to HDFS.  This is very commonly used for log files that
are consolidated at the hourly level and transferred to HDFS.

On 3/27/08 12:02 AM, "Raghavendra K" <raghavendra83@gmail.com> wrote:

> Hi,
> Thanks for the reply.
> Does this mean that once I close a file, I can open it only for reading?
> And if I reopen the same file to write any data then the old data will be
> lost and again its as good as a new file being created with the same name?
> On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <dhruba@yahoo-inc.com>
> wrote:
>> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
>> for more details.
>> Thanks,
>> dhruba
>> -----Original Message-----
>> From: Raghavendra K [mailto:raghavendra83@gmail.com]
>> Sent: Wednesday, March 26, 2008 11:29 PM
>> To: core-user@hadoop.apache.org
>> Subject: Append data in hdfs_write
>> Hi,
>>  I am using
>> hdfsWrite to write data onto a file.
>> Whenever I close the file and re open it for writing it will start
>> writing
>> from the position 0 (rewriting the old data).
>> Is there any way to append data onto a file using hdfsWrite.
>> I cannot use hdfsTell because it works only when opened in RDONLY mode
>> and
>> also I dont know the number of bytes written onto the file previously.
>> Please throw some light onto it.
>> --
>> Regards,
>> Raghavendra K

View raw message