hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Question about HDFS Architecture
Date Tue, 25 Aug 2009 02:07:16 GMT
Yes client continues to use the info, until it becomes invalid.
After that it will contact the name-node and update the cache.

--Konstantin

Harold Lim wrote:
> Hi Konstantin,
> 
> 
> How long does the client keep the info in its cache? Or does it continue to use the info,
until it becomes invalid (i.e., contacting a data node but the data node does not have that
particular file anymore)?
> 
> 
> 
> 
> 
> Thanks,
> Harold
> 
> --- On Mon, 8/24/09, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
> 
>> From: Konstantin Shvachko <shv@yahoo-inc.com>
>> Subject: Re: Question about HDFS Architecture
>> To: hdfs-user@hadoop.apache.org
>> Date: Monday, August 24, 2009, 9:40 PM
>> Harold,
>>
>> Both answers by Aaron were incorrect.
>>
>>> Does the client cache this information, or does it
>> always talk to the namenode first?
>>
>> Yes, the client caches replica locations received from the
>> name-node.
>> On open() it receives locations of the first 10 blocks of
>> the file.
>> In most cases these are all file blocks. If not then the
>> client will
>> get another portion of blocks when needed, and will also
>> cache them.
>>
>>> Also, if a file has multiple replicas stored on
>> multiple datanodes on the same "rack", how does the namenode
>> pick which datanode the client has to talk to?
>>
>> The name-node returns block locations ordered by the
>> proximity to the client.
>> The client always contacts data-nodes in this order. It
>> cannot make any decisions
>> about the proximity because it does not possess knowledge
>> about the cluster topology.
>> If all replicas are on the same rack but not local to the
>> client then the ordering
>> returned by the name-node is arbitrary.
>> This may happen mostly if network topology is not
>> configured.
>> Otherwise replicas should be distributed on different
>> racks.
>> 3 replicas should be on at least 2 racks.
>>
>> Thanks
>> --Konstantin
>>
>>
>> Harold Lim wrote:
>>> To read/get a file, I understand that a client first
>> contacts the namenode to determine which datanode has the
>> file/block. Then, it contacts the datanode for the actual
>> file.
>>> Does the client cache this information, or does it
>> always talk to the namenode first? 
>>> Also, if a file has multiple replicas stored on
>> multiple datanodes on the same "rack", how does the namenode
>> pick which datanode the client has to talk to? In this case,
>> all datanodes are homogeneous, which makes the
>> "rack-awareness" unimportant to the decision making.
>>> Thanks,
>>> Harold
>>>
>>>
>>>        
> 
> 
>       
> 

Mime
View raw message