hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hardware Setup
Date Thu, 15 Oct 2009 16:32:38 GMT
On Thu, Oct 15, 2009 at 12:12 PM, Patrick Angeles
<patrickangeles@gmail.com> wrote:
> After the discount, an equivalently configured Dell comes about 10-20% over
> the Silicon Mechanics price. It's close enough that unless you're spending
> 100k it won't make that much of a difference. Talk to a rep, call them out
> on the ridiculous drive pricing, buy at the end of their fiscal quarter.
> Strip down the machines (no RAID cards, no DVD/CD drive, non-redundant power
> supply, etc.) to get the price lower. No need for dedicated SATA drives with
> RAID for your OS. Most of that is accessed during boot time so it won't
> contend that much with HDFS.
> We just got a bunch of Dell R410s with 24GB ram, 2x2.26Ghz procs and 4x1TB
> drives.
> I would go for beefier nodes with less quantity.  Of course, some of this
> depends on the volume of data and type of processing that you do. If you're
> running HBase, you would benefit from lots of RAM. You also have to remember
> that dual socket configs are more power efficient so you can fit more in a
> single rack.
> Cheers,
> - P
> On Thu, Oct 15, 2009 at 11:48 AM, Alex Newman <posix4e@gmail.com> wrote:
>>          So my company is looking at only using dell or hp for our
>> hadoop cluster and a sun thumper to backup the data. The prices are
>> ok, after a 40% discount, but realistically I am paying twice as much
>> as if I went to silicon mechanics, and with a much slower machine. It
>> seems as though the big expense are the disks. Even with a 40%
>> discount 550$ per 1tb disk seems crazy expensive. Also, they are
>> pushing me to build a smaller cluster (6 nodes) and I am pushing back
>> for nodes half the size but having twice as many. So how much of a
>> performance difference can I expect btwn 12 nodes with 1 xeon 5 series
>> running at 2.26 ghz 8 gigs of ram with 4 1 tb disks and a 6 node
>> cluster with 2 xeon 5 series running at 2.26 16 gigs of ram with 8 1
>> tb disks. Both setups will also have 2 very small sata drives in raid
>> 1 for the OS. I will be doing some stuff with hadoop and a lot of
>> stuff with HBase. What are the considerations with HDFS performance
>> with a low number of nodes,etc.

>>No need for dedicated SATA drives with
>>RAID for your OS. Most of that is accessed during boot time so it won't
>>contend that much with HDFS.

You may want to RAID your OS. If you lose a datanode with a large
volume of data say (8 TB) Hadoop will begin the process of
re-replicating that data elsewhere, that can use cluster resources.

You MIGHT want to avoid that, or maybe you do not care.

Having 2 disks for the OS is a waist of bays, so we got clever. Take a
system with 8 drives @ 1TB. Slice off ~30 GB from two of the disks and
use Linux software RAID-1 MIRROR for the OS+ swap.

Now you don't need to separate disks for the OS and you don't run the
risk of losing that one disk that takes down the entire DataNode.

View raw message