hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongzhe Ma <mdzfi...@gmail.com>
Subject Re: About HDFS's single-writer, multiple-reader model, any use case?
Date Mon, 02 Feb 2015 08:00:55 GMT
Hi,

Thanks for your quick reply. But the point that puzzles me is why HDFS
chose such a model, not how this model works.

Recently I had read the append design doc, and one issue about implementing
append correctly is to maintain read consistency, which is a problem only
when someone reads a file being written. So I wander why not lock the whole
file during write operations and reject any read request.

I think the best answer to this question is to think up some concrete use
cases. I tried for a few days, but in vain. A good example to persuade
people append operation is important in HDFS is to support applications
that generate lots of log files, such as HBase. By allowing files to be
re-opened and appended, re-started instances may reuse existing log files,
so that we don't create too many files and crash the NameNode. But again,
these log files won't be read in parallel.

Since normal posix file systems don't support that feature, I wander
whether anyone knows any example in cloud applications.

2015-02-02 15:36 GMT+08:00 <hadoop.support@visolve.com>:

> Hello Dongzhe Ma,
>
>
>
> Yes HDFS employs Single writer, multiple reader model. This means :
>
>
>
> *WRITE*
>
> •       HDFS client maintains a lease on files it opened for write (for
> entire file not for block)
>
> •       Only one client can hold a lease on a single file
>
> •       For each block of data, setup a pipeline of Data Nodes to write
> to.
>
> •       A file written cannot be modified, but can be appended
>
> •       Client periodically renews the lease by sending heartbeats to the
> NameNode
>
> •       Lease Timeout/Expiration:
>
> •       *Soft Limit*: exclusive access to file, can extend lease
>
> •       Until soft limit expires client has exclusive access to the file
>
> •       After Soft limit, any client can claim the lease
>
> •       *Hard Limit*: 1 hour - continue to have access unless some other
> client pre-empts it.
>
> •       Also after hard limit, the file is closed.
>
> *READ*
>
> •       Get list of Data Nodes from Name Node in topological sorted
> order. Then read data directly from Data Nodes.
>
> •       During read, the checksum is validated and if found different, it
> is reported to Name Node which marks it for deletion.
>
> •       On error while reading a block, next replica from the pipeline is
> used to read it.
>
>
>
> You can refer below link for creating test cases .
>
>
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> Hope this helps!!!
>
>
>
> Thanks and Regards,
>
> S.RagavendraGanesh
>
> Hadoop Support Team
>
> ViSolve Inc.|www.visolve.com
>
>
>
>
>
>
>
> *From:* Dongzhe Ma [mailto:mdzfirst@gmail.com]
> *Sent:* Monday, February 02, 2015 10:50 AM
> *To:* user@hadoop.apache.org
> *Subject:* About HDFS's single-writer, multiple-reader model, any use
> case?
>
>
>
> We know that HDFS employs a single-writer, multiple-reader model, which
> means that there could be only one process writing to a file at the same
> time, but multiple readers can also work in parallel and new readers can
> even observe the new content. The reason for this design is to simplify
> concurrency control. But, is it necessary to support reading during
> writing? Can anyone bring up some use cases? Why not just lock the whole
> file like other posix file systems (in terms of locking granularity)?
>



-- 
Best wishes,

--------

Ma Dongzhe
Department of Computer Science and Technology
Tsinghua University
Beijing 100084, P.R. China

Mime
View raw message