Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20241995A for ; Sun, 23 Oct 2011 17:18:50 +0000 (UTC) Received: (qmail 20841 invoked by uid 500); 23 Oct 2011 17:18:42 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 20732 invoked by uid 500); 23 Oct 2011 17:18:42 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 20717 invoked by uid 500); 23 Oct 2011 17:18:42 -0000 Delivered-To: apmail-hadoop-core-user@hadoop.apache.org Received: (qmail 20712 invoked by uid 99); 23 Oct 2011 17:18:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Oct 2011 17:18:42 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of maheswara@huawei.com designates 206.16.17.211 as permitted sender) Received: from [206.16.17.211] (HELO usaga01-in.huawei.com) (206.16.17.211) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Oct 2011 17:18:35 +0000 Received: from huawei.com (usaml01-in [172.18.4.6]) by usaga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LTJ002D642DTE@usaga01-in.huawei.com>; Sun, 23 Oct 2011 12:18:13 -0500 (CDT) Received: from huawei.com ([172.17.1.188]) by usaga01-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LTJ00GAF42CH3@usaga01-in.huawei.com>; Sun, 23 Oct 2011 12:18:13 -0500 (CDT) Received: from [172.24.1.55] (Forwarded-For: [49.249.140.223]) by szxmc01-in.huawei.com (mshttpd); Sun, 23 Oct 2011 22:18:12 +0500 Date: Sun, 23 Oct 2011 22:18:12 +0500 From: Uma Maheswara Rao G 72686 Subject: Re: Need help understanding Hadoop Architecture In-reply-to: <32705405.post@talk.nabble.com> To: common-user@hadoop.apache.org Cc: core-user@hadoop.apache.org Message-id: MIME-version: 1.0 X-Mailer: iPlanet Messenger Express 5.2 HotFix 2.14 (built Aug 8 2006) Content-type: text/plain; charset=us-ascii Content-language: en Content-transfer-encoding: 7BIT Content-disposition: inline X-Accept-Language: en Priority: normal References: <32705405.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi, Firt of all, welcome to Hadoop. ----- Original Message ----- From: panamamike Date: Sunday, October 23, 2011 8:29 pm Subject: Need help understanding Hadoop Architecture To: core-user@hadoop.apache.org > > I'm new to Hadoop. I've read a few articles and presentations > which are > directed at explaining what Hadoop is, and how it works. Currently my > understanding is Hadoop is an MPP system which leverages the use of > largeblock size to quickly find data. In theory, I understand how > a large block > size along with an MPP architecture as well as using what I'm > understandingto be a massive index scheme via mapreduce can be used > to find data. > > What I don't understand is how ,after you identify the appropriate > 64MBblocksize, do you find the data you're specifically after? > Does this mean > the CPU has to search the entire 64MB block for the data of > interest? If > so, how does Hadoop know what data from that block to retrieve? > > I'm assuming the block is probably composed of one or more files. > If not, > I'm assuming the user isn't look for the entire 64MB block rather a > portionof it. > I am just giving breif about file system here. Distributed file system contains, NameNode, DataNode, Checkpointing nodes and DFSClient. Here NameNode will maintain the metadat about the files and blocks. Datanode holds the actual data. and it will send the heartbeats to NN.So, Namenode knows about the DN status. DFSClient is client side ligic, which will first ask the namenode to give set of DN to write the file. Then NN will add their entries in metadata and give DN list to client. Then client will write the Data to Dtatnodes directly. While reading the file also, Client will ask NN to give the block locations, then client will directly connect to DN and read the data. There are many other concepts replication, leasemonitoring...etc. I hope this will give you about initial understanding about HDFS. Please go through the below document which will explan you very clearly with the architecture diagrams. > Any help indicating documentation, books, articles on the subject > would be > much appreciated. Here is a doc for HADOOP http://db.trimtabs.com:2080/mindterm/ebooks/Hadoop_The_Definitive_Guide_Cr.pdf > > Regards, > > Mike > -- > View this message in context: http://old.nabble.com/Need-help- > understanding-Hadoop-Architecture-tp32705405p32705405.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > Regards, Uma