hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Using a hard drive instead of
Date Fri, 12 Oct 2012 04:16:24 GMT
Hi Mark,

Note that the NameNode does random memory access to serve back any
information or mutate request you send to it, and that there can be
several number of concurrent clients. So do you mean a 'very fast hard
drive' thats faster than the RAM for random access itself? The
NameNode does persist its block information onto disk for various
purposes, but to actually make the NameNode use disk storage
completely (and not specific parts of it disk-cached instead) wouldn't
make too much sense to me. That'd feel like trying to communicate with
a process thats swapping, performance-wise.

The too many files issue is bloated up to sound like its a NameNode
issue but it isn't in reality. HDFS allows you to process lots of
files really fast, aside of helping store them for long periods, and a
lot of tiny files only gets you down in such operations with overheads
of opening and closing files in the way of reading them all at a time.
With a single or a few large files, all you do is block (data) reads,
and very few NameNode communications - ending up going much faster.
This is the same for local filesystems as well, but not many think of
that.

On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com> wrote:
> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not a
> problem, and warm fail-over is automatic. What would I need to change in the
> NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark



-- 
Harsh J

Mime
View raw message