hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hardware Setup
Date Thu, 15 Oct 2009 16:32:38 GMT
On Thu, Oct 15, 2009 at 12:12 PM, Patrick Angeles
<patrickangeles@gmail.com> wrote:
> After the discount, an equivalently configured Dell comes about 10-20% over
> the Silicon Mechanics price. It's close enough that unless you're spending
> 100k it won't make that much of a difference. Talk to a rep, call them out
> on the ridiculous drive pricing, buy at the end of their fiscal quarter.
> Strip down the machines (no RAID cards, no DVD/CD drive, non-redundant power
> supply, etc.) to get the price lower. No need for dedicated SATA drives with
> RAID for your OS. Most of that is accessed during boot time so it won't
> contend that much with HDFS.
>
> We just got a bunch of Dell R410s with 24GB ram, 2x2.26Ghz procs and 4x1TB
> drives.
>
> I would go for beefier nodes with less quantity.  Of course, some of this
> depends on the volume of data and type of processing that you do. If you're
> running HBase, you would benefit from lots of RAM. You also have to remember
> that dual socket configs are more power efficient so you can fit more in a
> single rack.
>
> Cheers,
>
> - P
>
> On Thu, Oct 15, 2009 at 11:48 AM, Alex Newman <posix4e@gmail.com> wrote:
>
>>          So my company is looking at only using dell or hp for our
>> hadoop cluster and a sun thumper to backup the data. The prices are
>> ok, after a 40% discount, but realistically I am paying twice as much
>> as if I went to silicon mechanics, and with a much slower machine. It
>> seems as though the big expense are the disks. Even with a 40%
>> discount 550$ per 1tb disk seems crazy expensive. Also, they are
>> pushing me to build a smaller cluster (6 nodes) and I am pushing back
>> for nodes half the size but having twice as many. So how much of a
>> performance difference can I expect btwn 12 nodes with 1 xeon 5 series
>> running at 2.26 ghz 8 gigs of ram with 4 1 tb disks and a 6 node
>> cluster with 2 xeon 5 series running at 2.26 16 gigs of ram with 8 1
>> tb disks. Both setups will also have 2 very small sata drives in raid
>> 1 for the OS. I will be doing some stuff with hadoop and a lot of
>> stuff with HBase. What are the considerations with HDFS performance
>> with a low number of nodes,etc.
>>
>

>>No need for dedicated SATA drives with
>>RAID for your OS. Most of that is accessed during boot time so it won't
>>contend that much with HDFS.

You may want to RAID your OS. If you lose a datanode with a large
volume of data say (8 TB) Hadoop will begin the process of
re-replicating that data elsewhere, that can use cluster resources.

You MIGHT want to avoid that, or maybe you do not care.

Having 2 disks for the OS is a waist of bays, so we got clever. Take a
system with 8 drives @ 1TB. Slice off ~30 GB from two of the disks and
use Linux software RAID-1 MIRROR for the OS+ swap.

Now you don't need to separate disks for the OS and you don't run the
risk of losing that one disk that takes down the entire DataNode.

Mime
View raw message