hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase: minimal number of boxes?
Date Fri, 15 Jan 2010 19:51:57 GMT
Hi Otis,

> * Is it OK for HBase Master and NameNode (+JobTracker) to run on
> the same server? NN needs memory.  What does HBase Master need
> the most?

The HBase Master is normally not very busy. It just needs to be
available when region servers check in, and for maintaining timely
Zookeeper heartbeats. As long as there is sufficient RAM on the
combined NameNode+Master (+JobTracker) such that the system never
swaps, this is ok. 

You can consider running multiple HBase masters to remove one SPOF
from the deployment, but the Hadoop side still has issues -- NameNode,
JobTracker. But, yes, for a non-HA deployment it makes sense to load
all of these up on one server. 

> * Is it OK for RegionServer and DataNode (+TaskTracker) to run on
> the same server? (I think this is actually advised, so data is
> local?)

Yes this is advised for that reason. Eventually, through background 
compaction, the data in HDFS which backs the region stores is brought
local. MapReduce jobs run against HBase after this happens get data
locality as each split corresponds to a region and the task will be
scheduled on the corresponding region server. 

> I believe RegionMaster is a memory hungry (b/c of Memcache)
> process?

Yes. The more RAM you can give to the region servers, the better for

  - Read caching (block cache) to avoid needing to hit the
    filesystem to serve frequently accessed data

  - Write caching (MemStore) to ride over flushes and compactions
    without blocking clients

> 1 or more Zookeepers            -- 1 or more dedicated boxes?

I would advise running a dedicated ZK quorum ensemble, yes. ZK is a 
2N+1 fault tolerant system, so deploy 3 servers if you can stand to
lose only one, or 5 if you want to be able to lose up to 2, etc. IIRC,
there are diminishing returns after 7 or 9. Though this may seem like
a lot of overhead just to run HBase, ZK has a lot of merit on its own
terms for providing synchronization primitives for your service or
application, hosting dynamic config (and use watchers to get notice
of changes), presence and group membership, etc. 

> Non-HA system, with local disk:
> 1 HB master/NN/JT + 1 RegionServer/TT/DN + 1 ZK   =  3 boxes

Too small. It is my experience you need 3 RegionServer/TT/DN for
something minimally useful. Also remember to tune HDFS for such a
small cluster -- set minimum replication to 1 or 2. 

> HA HBase cluster with HDFS:
> 2 HB masters/NNs/JTs + 2 RegionServers/TTs/DNs + 2 ZKs  =  6 boxes

Too small, likewise. 

Hope this helps, 

  - Andy


View raw message