hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Luis Ortiz Valmaseda <marcosluis2...@gmail.com>
Subject Re: Hardware Selection for Hadoop
Date Mon, 29 Apr 2013 16:35:24 GMT
Regards, Raj. To know that data that you want to process with Hadoop is
critical for this, at least an approximation of the data. I think that
Hadoop Operations is an invaluable resource for this:

- Hadoop use heavily RAM, so, the first resource that you have to consider
is to use all available RAM that you could give to the nodes, with a marked
focus on the NameNode/JobTracker Node.

- For the DataNode/TaskTracker nodes, is very good to have fast disks, like
SSDs but they are expensive, so you can consider this too. For me WD
Barracuda are awesome.

- A good network connection between the nodes. Hadoop is a RCP-based
platform, so a good network is critical for a healthy cluster

A good start for me is for a small cluster:

- NN/JT: 8 to 16 GB RAM
- DN/TT: 4 to 8 GB RAM

Consider to use always compression, to optimize the communication between
all services in your Hadoop cluster (Snappy is my favorite)

All these advices are in the Hadoop Operations book from Eric, so, it´s
must-read for every Hadoop System Engineer.

2013/4/29 Raj Hadoop <hadoopraj@yahoo.com>

>    Hi,
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
> Regards,
> Raj

Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

View raw message