hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patrickange...@gmail.com>
Subject Re: hadoop hardware configuration
Date Thu, 28 May 2009 19:00:52 GMT
On Thu, May 28, 2009 at 10:24 AM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:

> We do both -- push the disk image out to NFS and have a mirrored SAS hard
> drives on the namenode.  The SAS drives appear to be overkill.

This sounds like a nice approach, taking into account hardware, labor and
downtime costs... $700 for a RAID controller seems reasonable to minimize
maintenance due to a disk failure. Alex's suggestion to go JBOD and write to
all volumes would work as well, but slightly more labor intensive.

>>   2. What is a good processor-to-storage ratio for a task node with 4TB of
>>>  raw storage? (The config above has 1 core per 1TB of raw storage.)
> We're data hungry locally -- I'd put in bigger hard drives.  The 1.5TB
> Seagate drives seem to have passed their teething issues, and are at a
> pretty sweet price point.  They only will scale up to 60 IOPS, so make sure
> your workflows don't have lots of random I/O.

I haven't seen too many vendors offering the 1.5TB option. What type of data
are you working with? At what volumes? I sense that at 50GB/day, we are
higher than average in terms of data volume over time.

> As Steve mentions below, the rest is really up to your algorithm.  Do you
> need 1 CPU second / byte?  If so, buy more CPUs.  Do you need .1 CPU second
> / MB?  If so, buy more disks.

Unfortunately, we won't know until we have a cluster to test on. Classic
catch-22. We are going to experiment with a small cluster and a small data
set, with plans to buy more appropriately sized slave nodes based on what we

- P

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message