hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Thoughts on a hybrid HBase-Hadoop cluster
Date Tue, 13 Dec 2011 19:44:07 GMT

I was wondering if I could get some feedback on the craziness (or not) of setting up a hybrid
HBase-Hadoop cluster that has the following primary uses:

1) continuous writes to HBase
2) disk and CPU intensive reads from HBase by MR jobs and writes of aggregated data back to
HBase by those jobs
3) occasional reads by people/reporting apps that read aggregates from HBase

I'm calling this hybrid HBase-Hadoop cluster because not all nodes in the cluster would be
running both a RegionServer and DataNode + TaskTracker.
Instead, this is what it could look like:

* a set of *larger* nodes running RegionServers, DataNodes, TaskTrackers (e.g., large EC2
* a set of *smaller* nodes running only DNs and TTs, but *not* RSs (e.g. small EC2 instances)

The thinking here is that because that 2) above needs to process a lot of data (lots of reads,
good amount of writes, and relatively CPU intensive) it's nice to have more nodes and spindles.
But if we put RSs on all nodes to put it close to DNs, then all nodes need to be relatively
beefy in terms of RAM to keep HBase happy, and that translates to more $$$.
So the thinking/hope is that one could save $ by having more smaller/cheaper nodes to do the
disk IO and CPU intensive work, while having just enough RS instances on the big nodes to
handle the HBase side of 1) 2) and 3) above.

Is the above setup crazy?

Are there some obvious flaws that would really cause operational of performance pains?
Would such a cluster have major performance issues because of data that needs to be transferred
between DNs that are on all nodes and RSs running only on the big nodes?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message