hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Shah <man...@rapleaf.com>
Subject Re: NameNode hardware specs
Date Tue, 12 Aug 2008 19:26:48 GMT
the page i'm referring to is:

http://wiki.apache.org/hadoop/NameNode

- Manish
Co-Founder Rapleaf.com

We're looking for a product manager, sys admin, and software  
engineers...$10K referral award

On Aug 12, 2008, at 12:07 PM, lohit wrote:

> Hi Manish,
>
>> - why 15+ GBs?  Do we allocate all memory to the NameNode? or
>> just allocate some number using -Xmx and leave the rest available so
>> the machine doesnt start swapping?
>
>
> We allocated memory using -Xmx. NameNode stores the HDFS namespace  
> in memory, so, the bigger your namespace, the bigger would be your  
> heap. My guess is that if you have more than 15 million files  with  
> 20 million blocks you might need such a big system. But again, its  
> best to see how your namenode is performing and how much memory it  
> is consuming.
>
>>  - why RAID5?
>> - If running RAID 5, why is this necessary?
> Not absolute necessary. So, the namenode index or metadata is  
> critical piece of data. You cannot afford to lose or corrupt it.  
> That is the reason, we have an option of specifying multiple  
> directories to have different copies in parallel. You could  
> configure the directories to whatever you would like it to be.  
> Multiple drives, NFS....
>
>> - Configure the name node to store one set of transaction logs on a
>> separate disk from the index.
>> why?
> This feature is not yet supported, but a good one to have. Right  
> now both transaction logs and index (I am assuming this means  
> image) are in same directory and cannot to be configured to be  
> placed in separate directories. We should correct the wiki.
>
>> - Configure the name node to store another set of transaction logs to
>> a network mounted disk.
>>      - why?
> As explained above, this is to have multiple copies of your  
> metadata (dfs.name.dir in particular)
>
>> - Do not host DataNode, JobTracker or TaskTracker services on the
>> same system.
> typically Datanode and TaskTracker are run on all nodes while  
> JobTracker is run on dedicated node like NameNode (SecondaryNameNode).
> Sometimes, TaskTracker might crash and bring down a node and you do  
> not want your JobTracker or NameNode to be on that system.
>
> PS: Could you point to the wiki you are referring to? We might need  
> to make some corrections.
>
> Thanks,
> Lohit
>
> ----- Original Message ----
> From: Manish Shah <manish@rapleaf.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, August 12, 2008 11:24:45 AM
> Subject: NameNode hardware specs
>
> Can someone help explain in a little more detail some of the reasons
> for the hardware specs that were recently added to the wiki for the
> NameNode.  I guess i'm interested in learning how others have settled
> on these specs?  Is it by observed behavior, or just recommended by
> other hadoop users?
>
> - Use a good server with lots (15GB+) of RAM.
>       - why 15+ GBs?  Do we allocate all memory to the NameNode? or
> just allocate some number using -Xmx and leave the rest available so
> the machine doesnt start swapping?
>
> - Consider using fast RAID5 storage for keeping the index.
>       - why RAID5?
>
> - List more than one name node directory in the configuration, so
> that multiple copies of the indices will be stored. As long as the
> directories are on separate disks, a single full disk will not
> corrupt the index.
>       - If running RAID 5, why is this necessary?
>
> - Configure the name node to store one set of transaction logs on a
> separate disk from the index.
>       - why?
>
> - Configure the name node to store another set of transaction logs to
> a network mounted disk.
>       - why?
>
> - Do not host DataNode, JobTracker or TaskTracker services on the
> same system.
>       - how much memory would the job tracker need?  Does it use a
> lot of CPU? In general, what are good specs for a job tracker machine
> and can the machine be shared with other services?
>
> Thanks so much for the help.  I think it would be hugely helpful for
> the community to start describing their respective setups for hadoop
> clusters in more detail than just the config for datanodes and
> cluster size.  I think we all want to be confident that we are
> spending money on the right machines to grow our cluster the right  
> way.
>
>
> Most appreciated,
>
> - Manish
> Co-Founder Rapleaf.com
>
> We're looking for a product manager, sys admin, and software
> engineers...$10K referral award


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message