hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Schulz <danielschulz2...@hotmail.com>
Subject Re: Who will Responsible for Handling DFS Write Pipe line Failure
Date Mon, 07 Sep 2015 05:57:19 GMT
Hi Srinivas,

In Hadoop most DFS accesses are two staged: first query the NameNode (NN), then go down to
the DataNodes (DN). So most of the time first access Master nodes for metadata; then access
Worker nodes for payload data.

(1) In your scenario, you want to write the file named "HomerQuotes.txt" with Replication
Factor 3 (RF=3). First, you query NN for the desired DNs to store your text file at. NN will
respond with, lets assume, DN_01. Fine — let's go down to Worker nodes.

(2) Now your text file HomerQuotes.txt will be send to the IP addresses or host names NN just
sent you in (1). You transmit your file completely now to DN_1. When it arrives there DN_1
reports back to NN. As the RF of this file is supposed to be 3 but only one replica exists,
DN_1 will re-distribute your file across the cluster twice. If, and only if, all those three
replications/copy jobs succeeded, DN_01 will report back to the client success. Otherwise
a failure is reported.

In real-world Hadoop, a client like "$ hdfs dfs" or WebHdfs is doing these stages for you.
But this is, what is going on under the bonnet.

I hope this helps. Otherwise feel free to contact us for more questions.

Best regards, Daniel.

> On 07 Sep 2015, at 07:09, miriyala srinivas <srinivas2828@gmail.com> wrote:
> Hi All,
> I am just started Learning fundamentals of  HDFS  and its internal mechanism , concepts
used here are very impressive and looks simple but makes me confusing and my question is who
will responsible for handling DFS write failure in pipe line (assume replication factor is
3 and 2nd DN failed in the pipeline)? if any data node failed during the pipe line write then
the entire pipe line will get stopped? or new data node added to the existing pipe line? how
this entire mechanism works?I really appreciate if someone with good knowledge of HDFS can
explains to me.
> Note:I read bunch of documents but none seems to be explained what i am looking for.
> thanks
> srinivas
View raw message