hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Martyniak <j...@beforedawnsolutions.com>
Subject Cluster Machines
Date Tue, 03 Nov 2009 13:25:46 GMT

I am in the process of setting up a Hadoop cluster, starting small at  
first but rapidly growing.  I plan on running the following, Hadoop,  
HDFS, HBase, Nutch and Mahout.  I am starting with 2 machines to get  
all of the mechanics worked out, and then growing the cluster.

The first two machines that I have are Dell SC1425, Dual 2.8 ghz  
processors, 4 GB of RAM, 2 1.5 TB drives (JDOB), and all sitting on a  
gigabit switch.

I guess I have a couple of questions:

1) Should each node have RAID 1, or is it sufficient to have HDFS take  
care of that?  Because for each node I could put a 80 GB drive for the  
boot drive and leave one of the 1.5 TBs for the data drive that Hadoop  

2) As I grow the system is it necessary to have all nodes with the  
same config?  Is there any benefit or problem either way.  The way  
that I have been approaching it, is to get nodes that I can get the  
best deal on that have decent performance.  So if future boxes have  
Dual or Quad core, will that cause some problem, management or  

3) For the Hard disk sizes if some of the boxes have 1.5 TBs and other  
boxes have say 300 GB, will HDFS have an issue managing that?

Thanks in advance for the help.


John Martyniak
Before Dawn Solutions, Inc.
9457 S. University Blvd #266
Highlands Ranch, CO 80126
o: 877-499-1562
e: john@beforedawnsoutions.com
w: http://www.beforedawnsolutions.com

View raw message