Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of maheswara@huawei.com designates
 206.16.17.211 as permitted sender)
Date: Sun, 23 Oct 2011 22:18:12 +0500
From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
Subject: Re: Need help understanding Hadoop Architecture
In-reply-to: <32705405.post@talk.nabble.com>
To: common-user@hadoop.apache.org
Cc: core-user@hadoop.apache.org
Message-id: <fe8ccdafc55.c55fe8ccdaf@huawei.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-language: en
Content-transfer-encoding: 7BIT
Content-disposition: inline
Priority: normal
References: <32705405.post@talk.nabble.com>

Hi,

Firt of all, welcome to Hadoop. 
----- Original Message -----
From: panamamike <panamamike@hotmail.com>
Date: Sunday, October 23, 2011 8:29 pm
Subject: Need help understanding Hadoop Architecture
To: core-user@hadoop.apache.org

> 
> I'm new to Hadoop.  I've read a few articles and presentations 
> which are
> directed at explaining what Hadoop is, and how it works.  Currently my
> understanding is Hadoop is an MPP system which leverages the use of 
> largeblock size to quickly find data.  In theory, I understand how 
> a large block
> size along with an MPP architecture as well as using what I'm 
> understandingto be a massive index scheme via mapreduce can be used 
> to find data.
> 
> What I don't understand is how ,after you identify the appropriate 
> 64MBblocksize, do you find the data you're specifically after?  
> Does this mean
> the CPU has to search the entire 64MB block for the data of 
> interest?  If
> so, how does Hadoop know what data from that block to retrieve?
> 
> I'm assuming the block is probably composed of one or more files.  
> If not,
> I'm assuming the user isn't look for the entire 64MB block rather a 
> portionof it.
> 
I am just giving breif about file system here.

Distributed file system contains, NameNode, DataNode, Checkpointing nodes and DFSClient.

Here NameNode will maintain the metadat about the files and blocks.
Datanode holds the actual data. and it will send the heartbeats to NN.So, Namenode knows about the DN status.

DFSClient is client side ligic, which will first ask the namenode to give set of DN to write the file. Then NN will add their entries in metadata and give DN list to client. Then client will write the Data to Dtatnodes directly.

While reading the file also, Client will ask NN to give the block locations, then client will directly connect to DN and read the data.

There are many other concepts replication, leasemonitoring...etc.

I hope this will give you about initial understanding about HDFS.
Please go through the below document which will explan you very clearly with the architecture diagrams.

> Any help indicating documentation, books, articles on the subject 
> would be
> much appreciated.
Here is a doc for HADOOP http://db.trimtabs.com:2080/mindterm/ebooks/Hadoop_The_Definitive_Guide_Cr.pdf
> 
> Regards,
> 
> Mike
> -- 
> View this message in context: http://old.nabble.com/Need-help-
> understanding-Hadoop-Architecture-tp32705405p32705405.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 

Regards,
Uma