hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fondermann <bernd.fonderm...@googlemail.com>
Subject Re: What if I want do random write in Hadoop?
Date Wed, 02 Dec 2009 09:47:51 GMT
On Wed, Dec 2, 2009 at 09:45, xiao yang <yangxiao9901@gmail.com> wrote:
> FSDataInputStream is seekable, but FSDataOutputStream is not?
> Why? What are the difficulties to support random write?

 Simple Coherency Model

HDFS applications need a write-once-read-many access model for files.
A file once created, written, and closed need not be changed. This
assumption simplifies data coherency issues and enables high
throughput data access. A Map/Reduce application or a web crawler
application fits perfectly with this model. There is a plan to support
appending-writes to files in the future.

Data in HDFS is replicated (potentially across data center
boundaries). While changing a file, old copies of the data remain.
This results in consistency problems when massively reading in
parallel, one of the strength of HDFS. To avoid these complications,
changing written data is not possible.

Other distributed systems, like for example Apache CouchDB, have
different consistency models.


View raw message