hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harold Lim <rold...@yahoo.com>
Subject Re: Question about HDFS Architecture
Date Tue, 25 Aug 2009 02:00:22 GMT
Hi Konstantin,


How long does the client keep the info in its cache? Or does it continue to use the info,
until it becomes invalid (i.e., contacting a data node but the data node does not have that
particular file anymore)?





Thanks,
Harold

--- On Mon, 8/24/09, Konstantin Shvachko <shv@yahoo-inc.com> wrote:

> From: Konstantin Shvachko <shv@yahoo-inc.com>
> Subject: Re: Question about HDFS Architecture
> To: hdfs-user@hadoop.apache.org
> Date: Monday, August 24, 2009, 9:40 PM
> Harold,
> 
> Both answers by Aaron were incorrect.
> 
> > Does the client cache this information, or does it
> always talk to the namenode first?
> 
> Yes, the client caches replica locations received from the
> name-node.
> On open() it receives locations of the first 10 blocks of
> the file.
> In most cases these are all file blocks. If not then the
> client will
> get another portion of blocks when needed, and will also
> cache them.
> 
> > Also, if a file has multiple replicas stored on
> multiple datanodes on the same "rack", how does the namenode
> pick which datanode the client has to talk to?
> 
> The name-node returns block locations ordered by the
> proximity to the client.
> The client always contacts data-nodes in this order. It
> cannot make any decisions
> about the proximity because it does not possess knowledge
> about the cluster topology.
> If all replicas are on the same rack but not local to the
> client then the ordering
> returned by the name-node is arbitrary.
> This may happen mostly if network topology is not
> configured.
> Otherwise replicas should be distributed on different
> racks.
> 3 replicas should be on at least 2 racks.
> 
> Thanks
> --Konstantin
> 
> 
> Harold Lim wrote:
> > To read/get a file, I understand that a client first
> contacts the namenode to determine which datanode has the
> file/block. Then, it contacts the datanode for the actual
> file.
> > 
> > Does the client cache this information, or does it
> always talk to the namenode first? 
> > Also, if a file has multiple replicas stored on
> multiple datanodes on the same "rack", how does the namenode
> pick which datanode the client has to talk to? In this case,
> all datanodes are homogeneous, which makes the
> "rack-awareness" unimportant to the decision making.
> > 
> > Thanks,
> > Harold
> > 
> > 
> >       
> 


      

Mime
View raw message