hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Using a hard drive instead of
Date Wed, 17 Oct 2012 23:27:52 GMT

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and
their CLDB. 

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I
can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size? 
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object
in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10
million files, each using a block, would use about 3 gigabytes of memory. Scaling up much
beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory. 

Thats 6 Billion blocks. 
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB? 
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory
to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation. 

Just Saying... 


On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <rarecactus@gmail.com> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as
Linux swap space.
> The better answer is to use federation or another solution if your needs exceed those
servable by a single NameNode.
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <mark.kerzner@shmsoft.com> wrote:
> Hi,
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I
want the NameNode to store its blocks information on this hard drive instead of in memory.

> Why would I do it? Scalability (no federation needed), many files are not a problem,
and warm fail-over is automatic. What would I need to change in the NameNode to tell it to
use the hard drive?
> Thank you,
> Mark

View raw message