Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jenvor@gmail.com designates
 209.85.212.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr3x2fMxuVu98p+V8=pexMkeKVGMgYkXJ_VHUwvoxxAqGA@mail.gmail.com>
References: 
 <CAC2Aadk_fEcqR598uBDteGOukV6WKTDBrhmze4WqWTyfB+yKvQ@mail.gmail.com>
	<CAOcnVr1x3T2PrxncR=AP82cbMJsOrj-Sdvyv0evFsckAsp4Xgw@mail.gmail.com>
	<CAC2Aad=VGmSpvxixC4L7g5GxoFyzap0GW-Gi=XO-jPw++2SRgw@mail.gmail.com>
	<CAOcnVr3x2fMxuVu98p+V8=pexMkeKVGMgYkXJ_VHUwvoxxAqGA@mail.gmail.com>
Date: Fri, 27 Jan 2012 02:52:15 -0500
Message-ID: 
 <CAC2AadkAebx=bALSMxN3p9f2TURC=HtNO6xK4jKR9tPifn8niA@mail.gmail.com>
Subject: Re: Replication is done synchronously or asynchronously?
From: "Zhenhua (Gerald) Guo" <jenvor@gmail.com>
To: hdfs-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks a lot!  Your reply thoroughly cleared my confusion.

Gerald

On Fri, Jan 27, 2012 at 1:02 AM, Harsh J <harsh@cloudera.com> wrote:
> Yes you're correct.
>
> Also note that sometimes the request may be for 3 replicas but
> NameNode may only be able to grant lesser cause remaining DNs are
> full/unreachable/loaded-with-threads, in which case write will work
> with just the lesser amount of pipeline size, so long as its >=3D
> dfs.replication.min.
>
> If it gets 0 assignments when requesting for a write, it runs into
> this: wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicat=
ed_to_0_nodes.2C_instead_of_1.22_mean.3F
>
> On Fri, Jan 27, 2012 at 4:53 AM, Zhenhua (Gerald) Guo <jenvor@gmail.com> =
wrote:
>> Thanks, Harsh J. =A0Your answer is quite helpful!
>> If I understand right, writes wait until all replicas are created if
>> there is no error during the replication process. =A0If there is any
>> error in the replication pipeline, dfs.replication.min comes into play
>> . =A0Is my understanding correct?
>>
>> Gerald
>>
>> On Thu, Jan 26, 2012 at 4:07 PM, Harsh J <harsh@cloudera.com> wrote:
>>> Hi,
>>>
>>> On Fri, Jan 27, 2012 at 12:27 AM, Zhenhua (Gerald) Guo <jenvor@gmail.co=
m> wrote:
>>>> I have two questions regarding creation of replicas.
>>>> - When a user uploads a file to HDFS, it returns whenever the first
>>>> replica is created? or the client needs wait until all replicas are
>>>> created?
>>>> - When the output of MapReduce jobs is written to HDFS (by reduce
>>>> tasks), the writing of output returns when the first replica is
>>>> created? or wait until all replicas are created?
>>>
>>> Both questions are the same as both do the same form of DFS write.
>>>
>>> Writes are synchronous and replication is pipelined, presently in Apach=
e Hadoop.
>>>
>>> But a write will succeed if at least 1 replica was written (controlled
>>> via dfs.replication.min -- pipeline can lose DNs out of errors, or can
>>> get fewer than requested DNs cause of load/space issues, but write
>>> will succeed if it at least gets one DN)
>>>
>>> Also see the whole conversation at
>>> http://search-hadoop.com/m/bF99W1ZmNqz1 for some more tidbits you
>>> might find interesting.
>>>
>>> --
>>> Harsh J
>>> Customer Ops. Engineer, Cloudera
>
>
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera