hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Appending to existing files in HDFS
Date Sat, 18 Sep 2010 03:22:03 GMT
  Once you close it, the HDFS daemons own the file and make sure it's 
copied around. Allowing reopens at this point makes that distribution 
control that much more complex: asynchronous processes have to agree 
that the old file now is longer.

Another thing to keep in mind is that HDFS has blocksizes in the 
megabytes- 64m, 128m are common. An HDFS file should be designed to be 
maybe 90% of this size when write it and close it.

Chittaranjan Hota wrote:
> Hi Steve,
> Thanks for the inputs.
> I could understand by now, that the files are "immutable". Wanted to 
> confirm. However little confused as to what role the "append" methods 
> are?
> I am now going to explore and see how it works out when I keep a 
> stream open and write data to it and close on an interval basis.
> Thanks again.
> Regards,
> Chitta
> MobileMe Reporting
> Ext: 21294
> Direct: 408-862-1294
> On Sep 17, 2010, at 2:57 PM, Steve Hoffman wrote:
>> This is a "feature" of HDFS.  Files are immutable.
>> You have to create a new file.  The file you are writing to isn't
>> available in hdfs until you close it.
>> Usually you'll have something buffering pieces and writing to hdfs.
>> Then you can roll those smaller files into larger chunks using a
>> nightly map-reduce job or something else.
>> You might want to look at the Flume project from cloudera (there are
>> others as well) and just log4j to local disk.  Then use flume agents
>> to send to a collector (or collectors) which write to hdfs on an
>> interval or other criteria.  Facebook's Scribe and Apache Chuckwa are
>> also contenders for these tasks.
>> Log collection seems to be a common use of hadoop these days.
>> If you google it, you'll find plenty of stuff.
>> Also (shameless plug for a presentation I just gave on this topic):
>> http://bit.ly/hoffmanchug20100915
>> Hope this helps!
>> Steve
>> On Fri, Sep 17, 2010 at 1:43 PM, Chittaranjan Hota <hota@apple.com> 
>> wrote:
>>> Hello,
>>> I am new to Hadoop and to this forum.
>>> Existing setup:
>>> Basically we have an existing set up where data is collected from a 
>>> JMS Q
>>> and written on to hard disk without Hadoop. Typcial I/O using log4j.
>>> Problem Statement:
>>> Now instead of writing it to hard disk, I would like to stream it to 
>>> HDFS, I
>>> know thats possible using the "FileSystem" class and create method. 
>>> Did a
>>> small POC on that as well.
>>> However I am not able to append to the created files.
>>> It throws the exception:
>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Append 
>>> to hdfs
>>> not supported. Please refer to dfs.support.append configuration 
>>> parameter.
>>> Am looking for any pointers/suggestions to resolve this?
>>> Please let me know if you need any further information.
>>> Thanks in advance.
>>> Regards,
>>> Chitta
>>> MobileMe Reporting
>>> Ext: 21294
>>> Direct: 408-862-1294

View raw message