hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenhua (Gerald) Guo" <jen...@gmail.com>
Subject Re: Replication is done synchronously or asynchronously?
Date Thu, 26 Jan 2012 23:23:19 GMT
Thanks, Harsh J.  Your answer is quite helpful!
If I understand right, writes wait until all replicas are created if
there is no error during the replication process.  If there is any
error in the replication pipeline, dfs.replication.min comes into play
.  Is my understanding correct?

Gerald

On Thu, Jan 26, 2012 at 4:07 PM, Harsh J <harsh@cloudera.com> wrote:
> Hi,
>
> On Fri, Jan 27, 2012 at 12:27 AM, Zhenhua (Gerald) Guo <jenvor@gmail.com> wrote:
>> I have two questions regarding creation of replicas.
>> - When a user uploads a file to HDFS, it returns whenever the first
>> replica is created? or the client needs wait until all replicas are
>> created?
>> - When the output of MapReduce jobs is written to HDFS (by reduce
>> tasks), the writing of output returns when the first replica is
>> created? or wait until all replicas are created?
>
> Both questions are the same as both do the same form of DFS write.
>
> Writes are synchronous and replication is pipelined, presently in Apache Hadoop.
>
> But a write will succeed if at least 1 replica was written (controlled
> via dfs.replication.min -- pipeline can lose DNs out of errors, or can
> get fewer than requested DNs cause of load/space issues, but write
> will succeed if it at least gets one DN)
>
> Also see the whole conversation at
> http://search-hadoop.com/m/bF99W1ZmNqz1 for some more tidbits you
> might find interesting.
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera

Mime
View raw message