hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G 72686 <mahesw...@huawei.com>
Subject Re: Need help understanding Hadoop Architecture
Date Sun, 23 Oct 2011 17:18:12 GMT

Firt of all, welcome to Hadoop. 
----- Original Message -----
From: panamamike <panamamike@hotmail.com>
Date: Sunday, October 23, 2011 8:29 pm
Subject: Need help understanding Hadoop Architecture
To: core-user@hadoop.apache.org

> I'm new to Hadoop.  I've read a few articles and presentations 
> which are
> directed at explaining what Hadoop is, and how it works.  Currently my
> understanding is Hadoop is an MPP system which leverages the use of 
> largeblock size to quickly find data.  In theory, I understand how 
> a large block
> size along with an MPP architecture as well as using what I'm 
> understandingto be a massive index scheme via mapreduce can be used 
> to find data.
> What I don't understand is how ,after you identify the appropriate 
> 64MBblocksize, do you find the data you're specifically after?  
> Does this mean
> the CPU has to search the entire 64MB block for the data of 
> interest?  If
> so, how does Hadoop know what data from that block to retrieve?
> I'm assuming the block is probably composed of one or more files.  
> If not,
> I'm assuming the user isn't look for the entire 64MB block rather a 
> portionof it.
I am just giving breif about file system here.

Distributed file system contains, NameNode, DataNode, Checkpointing nodes and DFSClient.

Here NameNode will maintain the metadat about the files and blocks.
Datanode holds the actual data. and it will send the heartbeats to NN.So, Namenode knows about
the DN status.

DFSClient is client side ligic, which will first ask the namenode to give set of DN to write
the file. Then NN will add their entries in metadata and give DN list to client. Then client
will write the Data to Dtatnodes directly.

While reading the file also, Client will ask NN to give the block locations, then client will
directly connect to DN and read the data.

There are many other concepts replication, leasemonitoring...etc.

I hope this will give you about initial understanding about HDFS.
Please go through the below document which will explan you very clearly with the architecture

> Any help indicating documentation, books, articles on the subject 
> would be
> much appreciated.
Here is a doc for HADOOP http://db.trimtabs.com:2080/mindterm/ebooks/Hadoop_The_Definitive_Guide_Cr.pdf
> Regards,
> Mike
> -- 
> View this message in context: http://old.nabble.com/Need-help-
> understanding-Hadoop-Architecture-tp32705405p32705405.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.


View raw message