hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Using a hard drive instead of
Date Fri, 12 Oct 2012 05:01:47 GMT
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<gaurav.gs.sharma@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
> On Oct 11, 2012, at 21:27, Mark Kerzner <mark.kerzner@shmsoft.com> wrote:
> Harsh,
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
> Mark
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <harsh@cloudera.com> wrote:
>> Hi Mark,
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> --
>> Harsh J

Lance Norskog

View raw message