hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Replication is done synchronously or asynchronously?
Date Thu, 26 Jan 2012 21:07:16 GMT

On Fri, Jan 27, 2012 at 12:27 AM, Zhenhua (Gerald) Guo <jenvor@gmail.com> wrote:
> I have two questions regarding creation of replicas.
> - When a user uploads a file to HDFS, it returns whenever the first
> replica is created? or the client needs wait until all replicas are
> created?
> - When the output of MapReduce jobs is written to HDFS (by reduce
> tasks), the writing of output returns when the first replica is
> created? or wait until all replicas are created?

Both questions are the same as both do the same form of DFS write.

Writes are synchronous and replication is pipelined, presently in Apache Hadoop.

But a write will succeed if at least 1 replica was written (controlled
via dfs.replication.min -- pipeline can lose DNs out of errors, or can
get fewer than requested DNs cause of load/space issues, but write
will succeed if it at least gets one DN)

Also see the whole conversation at
http://search-hadoop.com/m/bF99W1ZmNqz1 for some more tidbits you
might find interesting.

Harsh J
Customer Ops. Engineer, Cloudera

View raw message