hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: About test/production server configuration
Date Tue, 06 Apr 2010 00:21:44 GMT
On Tue, Apr 6, 2010 at 12:26 AM, Andrew Purtell <apurtell@apache.org> wrote:
> The below from Patrick is not uncommon to encounter.
> The "commodity hardware" talk around MR and BigTable is a bit of a joke -- you can do
that if you can afford 1,000s or 10,000s of commodity components custom assembled. Hadoop+HBase
users want to do more with less, obviously. Colocating computation with storage has its price
-- either you horizontally scale wide or go vertical enough on each node to handle the load
you are throwing at the cluster you can afford.

Now that is getting me worried :(. We were not prepared for this.

> Sizing clusters is a black art.

Hmm, this I do agree! One reason for us in deciding for HBase is the
community is just absolutely great! and we are banking on this
community support with an outlook to give the community as much as we
can too...

> As for the spec of each individual node, I can share our current generation hardware
>   CPU: dual 6-core AMD (12 cores total)
>   RAM: 32 GB
>   DISK: 320 GB x 2 (RAID-1) system disk
>         500 GB x 8 (JBOD) data disks for HDFS
>   custom 1U chassis
>   We give 8 GB of RAM to the HBase region servers. All other Hadoop and HBase daemons
(DataNode, ZooKeeper, TaskTracker, etc.) use the default of 1 GB. Remainder of CPU and RAM
is for user tasks (MR).
>   Reads are best served from RAM via the block cache.
>   The more spindles, the higher I/O parallelism, therefore higher aggregate throughput.
>   The above is a good trade off between horizontal and vertical for us.
> Hope that helps.

This is very helpful! This gives us some idea at the least.

Thanks Patrick and Andrew.


>> From: Patrick Hunt
>> Subject: Re: About test/production server configuration
>> The ZK servers are sensitive to disk
>> (io) latency. I just troubleshot an
>> issue last week where a user was seeing 80second (second!)
>> latencies. Turns out they were running zk server, namenode,
>> tasktracker, and hbase region server all on the same box,
>> that box had a single spindle for all io activity and was
>> at 100% utilization for long periods of time. If
>> you want decent ZK API latencies (<100ms) you really
>> need to ensure that there's at least a separate spindle
>> available for the ZK transaction logs.

Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

View raw message