hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Question about HDFS Architecture
Date Fri, 21 Aug 2009 07:36:11 GMT
On Thu, Aug 20, 2009 at 3:44 PM, Harold Lim <rold_50@yahoo.com> wrote:

> To read/get a file, I understand that a client first contacts the namenode
> to determine which datanode has the file/block. Then, it contacts the
> datanode for the actual file.
> Does the client cache this information, or does it always talk to the
> namenode first?

The latter.

> Also, if a file has multiple replicas stored on multiple datanodes on the
> same "rack", how does the namenode pick which datanode the client has to
> talk to? In this case, all datanodes are homogeneous, which makes the
> "rack-awareness" unimportant to the decision making.

I believe the datanode itself picks. In the absence of rack information,
it's choice is random (unless one is localhost, in which case that one gets

- Aaron

> Thanks,
> Harold

View raw message