hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <hadoop.supp...@visolve.com>
Subject RE: About HDFS's single-writer, multiple-reader model, any use case?
Date Mon, 02 Feb 2015 07:36:13 GMT
Hello Dongzhe Ma,

 

Yes HDFS employs Single writer, multiple reader model. This means :

 

WRITE

•       HDFS client maintains a lease on files it opened for write (for entire file not
for block)

•       Only one client can hold a lease on a single file

•       For each block of data, setup a pipeline of Data Nodes to write to.

•       A file written cannot be modified, but can be appended

•       Client periodically renews the lease by sending heartbeats to the NameNode

•       Lease Timeout/Expiration:

•       Soft Limit: exclusive access to file, can extend lease

•       Until soft limit expires client has exclusive access to the file

•       After Soft limit, any client can claim the lease

•       Hard Limit: 1 hour - continue to have access unless some other client pre-empts
it. 

•       Also after hard limit, the file is closed.

READ

•       Get list of Data Nodes from Name Node in topological sorted order. Then read data
directly from Data Nodes.

•       During read, the checksum is validated and if found different, it is reported to
Name Node which marks it for deletion.

•       On error while reading a block, next replica from the pipeline is used to read it.

 

You can refer below link for creating test cases .  

 

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

 

Hope this helps!!!

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| <http://www.visolve.com> www.visolve.com

 

 

 

From: Dongzhe Ma [mailto:mdzfirst@gmail.com] 
Sent: Monday, February 02, 2015 10:50 AM
To: user@hadoop.apache.org
Subject: About HDFS's single-writer, multiple-reader model, any use case?

 

We know that HDFS employs a single-writer, multiple-reader model, which means that there could
be only one process writing to a file at the same time, but multiple readers can also work
in parallel and new readers can even observe the new content. The reason for this design is
to simplify concurrency control. But, is it necessary to support reading during writing? Can
anyone bring up some use cases? Why not just lock the whole file like other posix file systems
(in terms of locking granularity)?


Mime
View raw message