hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sesha Kumar <sesha...@gmail.com>
Subject Re: Regarding design of HDFS
Date Mon, 05 Sep 2011 14:29:01 GMT
On Thu, Aug 25, 2011 at 1:34 PM, Sesha Kumar <sesha911@gmail.com> wrote:

> Hi all,
> I am trying to get a good understanding of how Hadoop works, for my
> undergraduate project. I have the following questions/doubts :
> 1. Why does namenode store the blockmap (block to datanode mapping) in the
> main memory for all the files, even those that are not used?
> 2. Why cant namenode move out a part of the blockmap from main memory to a
> secondary storage device, when free space in main memory becomes scarce (
> due to large number of files) ?
> 3. Why cant the blockmap be constructed when a file is requested (by a
> client) and then be cached for later accesses?

Regarding my earlier post as mentioned above.
>From what i've read and understood,
1. Namenode stores blockmaps for all the blocks in its main memory. This can
be used to keep an up-to-date snapshot of total filesystem. But what i feel
is this blockmap is not a constant data and hence storing it in main memory
all the time can be avoided in order to save main memory space. On a request
for a file from the client the blockmap details can be fetched.
As the main memory space is a constraint for adding too many files to
filesystem, like in case of small files, this approach can save space. Only
the first fetch takes more time and after that we can have streaming data

I want to know why this was not considered, or if considered, i want to know
why it was not implemented?
Am i missing anything obvious.
All replies from namenode are for heartbeat signals. Am not sure bout the
time trade-off. Will it be much bigger? Is initial time of access as
much important as streaming access?

View raw message