Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C24E9F686 for ; Fri, 12 Apr 2013 08:57:41 +0000 (UTC) Received: (qmail 50007 invoked by uid 500); 12 Apr 2013 08:57:36 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 49796 invoked by uid 500); 12 Apr 2013 08:57:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49707 invoked by uid 99); 12 Apr 2013 08:57:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Apr 2013 08:57:36 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nitinpawar432@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Apr 2013 08:57:29 +0000 Received: by mail-la0-f45.google.com with SMTP id gw10so2280582lab.4 for ; Fri, 12 Apr 2013 01:57:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=8lQ0+34TKsT1TRa7m9HVHpWUYRfeZx/xSO12nX0isdQ=; b=sDskhk7n3HuYpA/Sh/1W2jaK6yzn6z4ZMC98RfDrG6U5HnKNQP1dFmA9pLqzH2Nelk Ce+WYwIS1t48UEbJlHKo0ld9q7xg85x8HCcu6lK7Ay+wA+45tWzqalCUXsgLGOV4r4Kp ZtHVKJfS/veCEnHBugtHbBpW/speXy+cKMoboQCwQsAe6cxKcbvxPJZeQZF959omIS+C DrZX6OHHIJScoUnnKNcmYVXMfZB28kv61OqdfzcthReqas2p85EiT2eQR8fOIvl/9LGs l4z04jl6jBNmfNc4Op8t42knzpafoh5QHQLRk8IM4fjeSZUJNiNwew8mVqOy+oHaEylA Jyxg== MIME-Version: 1.0 X-Received: by 10.112.131.169 with SMTP id on9mr4897825lbb.124.1365757029394; Fri, 12 Apr 2013 01:57:09 -0700 (PDT) Received: by 10.114.24.129 with HTTP; Fri, 12 Apr 2013 01:57:09 -0700 (PDT) In-Reply-To: <1365755432.10219.YahooMailNeo@web190701.mail.sg3.yahoo.com> References: <1364377874.13753.YahooMailNeo@web194703.mail.sg3.yahoo.com> <1364577771.12724.YahooMailNeo@web194704.mail.sg3.yahoo.com> <1364719534.91394.YahooMailNeo@web194703.mail.sg3.yahoo.com> <1365042870.89547.YahooMailNeo@web194702.mail.sg3.yahoo.com> <1365740112.75877.YahooMailNeo@web190702.mail.sg3.yahoo.com> <578478094-1365742754-cardhu_decombobulator_blackberry.rim.net-2001349931-@b16.c6.bise7.blackberry> <1365754060.82642.YahooMailNeo@web190701.mail.sg3.yahoo.com> <1365754228.40174.YahooMailNeo@web190705.mail.sg3.yahoo.com> <1365754910.18961.YahooMailNeo@web190702.mail.sg3.yahoo.com> <1365755432.10219.YahooMailNeo@web190701.mail.sg3.yahoo.com> Date: Fri, 12 Apr 2013 14:27:09 +0530 Message-ID: Subject: Re: Will HDFS refer to the memory of NameNode & DataNode or is it a separate machine From: Nitin Pawar To: user@hadoop.apache.org, Sai Sai Content-Type: multipart/alternative; boundary=047d7b3a824c96b2dd04da2616a1 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a824c96b2dd04da2616a1 Content-Type: text/plain; charset=ISO-8859-1 HDFS - hadoop distributed file system as it stands a file system .. first basic question you will need to search is do you need a process to run a file system? when you find an answer to that second question will be will a single process be enough for a distributed system ? meaning sub components of the system may exist on different machines namenode and datanode combined make hdfs. combining all of their processes you make hdfs. namenode is master for the hdfs which keeps the file system image in memory when it starts it loads it up in memory and serves all requests from memory there on. There are steps taken to save the FSImage to disk. You can read about it in detail in hdfs architecture. when you put a file in hdfs .. it may or may not go to a single machine. Namenode never stores the data files. it just stores the metadata for the hdfs. so when you load a file it will be going to datanode and the file information will be going to namenode. depending on the size it will be split in multiple blocks and then multiple blocks may land on multiple datanodes. If your filesize is less than or exactly equal to block size you can find out which datanode it is located. else there is no guarantee that file will be only on single node only if you have fully distributed mode PS: this is my understanding. Others may correct me as well On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai wrote: > A few basic questions: > > Will HDFS refer to the memory of NameNode & DataNode or is it a separate > machine. > > For NameNode, DataNode and others there is a process associated with each > of em. > But no process is for HDFS, wondering why? I understand that fsImage has > the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT > needs to get file info will they just look into the fsImage. > > When we put a file in HDFS is it possible to look/find in which node > (NN/DN) it physically sits. > > Any help is appreciated. > Thanks > Sai > -- Nitin Pawar --047d7b3a824c96b2dd04da2616a1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
HDFS - hadoop distributed file system=A0
as it stands = a file system .. first basic question you will need to search is do you nee= d a process to run a file system?=A0
when you find an answer to t= hat second question will be=A0
will a single process be enough for a distributed system ? meaning sub= components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all =A0of= their processes you make hdfs.=A0

namenode is master for the hdfs which keeps= the file system image in memory when it starts it loads it up in memory an= d serves all requests from memory there on. There are steps taken to save t= he FSImage to disk. You can read about it in detail in hdfs architecture.= =A0

when you put a file in hdfs .. it may or ma= y not go to a single machine. Namenode never stores the data files. it just= stores the metadata for the hdfs.=A0
so when you load a fi= le it will be going to datanode and the file information will be going to n= amenode. depending on the size it will be split in multiple blocks and then= multiple blocks may land on multiple datanodes. If your filesize is less t= han or exactly equal to block size you can find out which datanode it is lo= cated. else there is no=A0guarantee that file will be only on single node o= nly if you have fully distributed mode=A0

PS: this is my understanding. Others may co= rrect me as well =A0


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <saigraph@yaho= o.in> wrote:
A few basic questions:

Will HDFS refer to the memory of NameNode & DataNode or is it a s= eparate machine.
For NameNode, DataNode and others there is a process associated with each o= f em.
But no process is for HDFS, wondering why? I understand that fsImage has th= e meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT need= s to get file info will they just look into the fsImage.

When we put a file in HDFS is it possible to look/find in which node = (NN/DN) it physically sits.

=
Thanks
Sai



--
Nitin Pawar
--047d7b3a824c96b2dd04da2616a1--