hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenhua (Gerald) Guo" <jen...@gmail.com>
Subject Re: Replication is done synchronously or asynchronously?
Date Fri, 27 Jan 2012 07:52:15 GMT
Thanks a lot!  Your reply thoroughly cleared my confusion.

Gerald

On Fri, Jan 27, 2012 at 1:02 AM, Harsh J <harsh@cloudera.com> wrote:
> Yes you're correct.
>
> Also note that sometimes the request may be for 3 replicas but
> NameNode may only be able to grant lesser cause remaining DNs are
> full/unreachable/loaded-with-threads, in which case write will work
> with just the lesser amount of pipeline size, so long as its >=
> dfs.replication.min.
>
> If it gets 0 assignments when requesting for a write, it runs into
> this: wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F
>
> On Fri, Jan 27, 2012 at 4:53 AM, Zhenhua (Gerald) Guo <jenvor@gmail.com> wrote:
>> Thanks, Harsh J.  Your answer is quite helpful!
>> If I understand right, writes wait until all replicas are created if
>> there is no error during the replication process.  If there is any
>> error in the replication pipeline, dfs.replication.min comes into play
>> .  Is my understanding correct?
>>
>> Gerald
>>
>> On Thu, Jan 26, 2012 at 4:07 PM, Harsh J <harsh@cloudera.com> wrote:
>>> Hi,
>>>
>>> On Fri, Jan 27, 2012 at 12:27 AM, Zhenhua (Gerald) Guo <jenvor@gmail.com>
wrote:
>>>> I have two questions regarding creation of replicas.
>>>> - When a user uploads a file to HDFS, it returns whenever the first
>>>> replica is created? or the client needs wait until all replicas are
>>>> created?
>>>> - When the output of MapReduce jobs is written to HDFS (by reduce
>>>> tasks), the writing of output returns when the first replica is
>>>> created? or wait until all replicas are created?
>>>
>>> Both questions are the same as both do the same form of DFS write.
>>>
>>> Writes are synchronous and replication is pipelined, presently in Apache Hadoop.
>>>
>>> But a write will succeed if at least 1 replica was written (controlled
>>> via dfs.replication.min -- pipeline can lose DNs out of errors, or can
>>> get fewer than requested DNs cause of load/space issues, but write
>>> will succeed if it at least gets one DN)
>>>
>>> Also see the whole conversation at
>>> http://search-hadoop.com/m/bF99W1ZmNqz1 for some more tidbits you
>>> might find interesting.
>>>
>>> --
>>> Harsh J
>>> Customer Ops. Engineer, Cloudera
>
>
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera

Mime
View raw message