hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: What about append in hadoop files ?
Date Fri, 14 Jul 2006 07:02:24 GMT
Thomas FRIOL wrote:
> I would like to know today why it is not possible to append datas into 
> an existing file (Path) or why the FSDataOutputStream must be closed 
> before the file is written to the DFS.

Those are the current semantics of the filesytem: a file is not readable 
until it is closed, and files are write-once.  This considerably 
simplifies the implementation and supports the primary intended uses for 
DFS.  The simpler we keep DFS the easier it is to make it reliable and 
scalable.  At this point we are prioritizing reliability and scalability 
over new features.  Over time, when reliability and scalability are 
sufficiently demonstrated, these restrictions may be removed.

> In fact, my problem is that I have a servlet which is regularly writing 
> datas into a file in the DFS. Today, if my JVM crashes, I lose all my 
> datas because my output stream is closed only when the JVM stops itself.

You could periodically close the file and start writing a new file.

DFS is currently primarily used to support large, offline, batch 
computations.  For example, a log of critical data with tight 
transactional requirements is probably an inappropriate use of DFS at 
this time.  Again, this may change, but that's where we are now.


View raw message