hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: NameNode hardware specs
Date Tue, 12 Aug 2008 19:07:01 GMT
Hi Manish,

>- why 15+ GBs?  Do we allocate all memory to the NameNode? or  
>just allocate some number using -Xmx and leave the rest available so  
>the machine doesnt start swapping?


We allocated memory using -Xmx. NameNode stores the HDFS namespace in memory, so, the bigger
your namespace, the bigger would be your heap. My guess is that if you have more than 15 million
files  with 20 million blocks you might need such a big system. But again, its best to see
how your namenode is performing and how much memory it is consuming. 

>  - why RAID5?
> - If running RAID 5, why is this necessary?
Not absolute necessary. So, the namenode index or metadata is critical piece of data. You
cannot afford to lose or corrupt it. That is the reason, we have an option of specifying multiple
directories to have different copies in parallel. You could configure the directories to whatever
you would like it to be. Multiple drives, NFS....

>- Configure the name node to store one set of transaction logs on a  
>separate disk from the index.
> why?
This feature is not yet supported, but a good one to have. Right now both transaction logs
and index (I am assuming this means image) are in same directory and cannot to be configured
to be placed in separate directories. We should correct the wiki.

> - Configure the name node to store another set of transaction logs to  
> a network mounted disk.
>      - why?
As explained above, this is to have multiple copies of your metadata (dfs.name.dir in particular)

>- Do not host DataNode, JobTracker or TaskTracker services on the  
>same system.
typically Datanode and TaskTracker are run on all nodes while JobTracker is run on dedicated
node like NameNode (SecondaryNameNode).
Sometimes, TaskTracker might crash and bring down a node and you do not want your JobTracker
or NameNode to be on that system.

PS: Could you point to the wiki you are referring to? We might need to make some corrections.

Thanks,
Lohit

----- Original Message ----
From: Manish Shah <manish@rapleaf.com>
To: core-user@hadoop.apache.org
Sent: Tuesday, August 12, 2008 11:24:45 AM
Subject: NameNode hardware specs

Can someone help explain in a little more detail some of the reasons  
for the hardware specs that were recently added to the wiki for the  
NameNode.  I guess i'm interested in learning how others have settled  
on these specs?  Is it by observed behavior, or just recommended by  
other hadoop users?

- Use a good server with lots (15GB+) of RAM.
      - why 15+ GBs?  Do we allocate all memory to the NameNode? or  
just allocate some number using -Xmx and leave the rest available so  
the machine doesnt start swapping?

- Consider using fast RAID5 storage for keeping the index.
      - why RAID5?

- List more than one name node directory in the configuration, so  
that multiple copies of the indices will be stored. As long as the  
directories are on separate disks, a single full disk will not  
corrupt the index.
      - If running RAID 5, why is this necessary?

- Configure the name node to store one set of transaction logs on a  
separate disk from the index.
      - why?

- Configure the name node to store another set of transaction logs to  
a network mounted disk.
      - why?

- Do not host DataNode, JobTracker or TaskTracker services on the  
same system.
      - how much memory would the job tracker need?  Does it use a  
lot of CPU? In general, what are good specs for a job tracker machine  
and can the machine be shared with other services?

Thanks so much for the help.  I think it would be hugely helpful for  
the community to start describing their respective setups for hadoop  
clusters in more detail than just the config for datanodes and  
cluster size.  I think we all want to be confident that we are  
spending money on the right machines to grow our cluster the right way.


Most appreciated,

- Manish
Co-Founder Rapleaf.com

We're looking for a product manager, sys admin, and software  
engineers...$10K referral award

Mime
View raw message