hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject Re: Multiple HBase instances
Date Thu, 21 Oct 2010 10:48:25 GMT
Thanks Stack,

We're using Hadoop/HBase for backend processing and storage and serving out
of another database, so we don't have SLA for the response times of HBase
but might for the run time of production jobs to be done by a give time.

In this type of usage if we we're having around 40 nodes in
your experience what do you think would be a sensible way to split them if
we were having production, staging, development, hbase clusters and we think
about 10 nodes worth of resources will be needed for production and the rest
for staging/dev?

We've thought about having two hadoop clusters for production then staging
and development, with the latter having multiple hbase instances. But as
multiple hadoop clusters is quite a bit more admin overhead
and management hardware (namenodes, etc..) it might be simpler to just have
one. We then also have more nodes to spread the work from both if we have
more work in production or development and for the case of failure. But then
we have possible issues with testing jobs slowing down production jobs etc..
so have you found things like that to be an issue much? or have
you separated on the test jobs into other clusters?

The nodes we're using will be around 8 cores, 4 disk each, and 16-24GB ram.
Also if you can say what's size of clusters do you use there? or more
generally what kinda of minimal size of cluster do you find hbase to work
well with for performance of tables with say 10,000 regions of the default
region size?

Also with 10-20 nodes is it worth having dedicated machines for the
zookeeper / master nodes? or can they be mixed together fine? how does the
0.89/0.90 release change the work loads for zookeeper and the masters?

Quite a few questions, but hope they'll be useful for others to getting
started! that is one of the harder parts of starting to use hbase / hadoop
more. Maybe it would be a good idea to put some of this up on parts of the
wiki? or case study's of different peoples experience of work loads /
hardware. I could have a go at starting that off with what I've found when I
get some time.


On 20 October 2010 15:29, Stack <stack@duboce.net> wrote:

> Hey Dan:
> On Wed, Oct 20, 2010 at 2:09 AM, Dan Harvey <dan.harvey@mendeley.com>
> wrote:
> > Hey,
> >
> > We're just looking into ways to run multiple instances/versions of HBase
> for
> > testing/development and were wondering how other people have gone about
> > doing this.
> >
> Development of replication feature has made it so tests now can put up
> multiple concurrent clusters.   See TestHBaseClusterUtility which
> starts up three clusters in the one JVM each homed on its own
> directory in a single zookeeper instance, each running its own hdfs
> (having them share an hdfs should work too though might need some
> HBaseTestingUtility fixup).
> At SU, there are mutliple clusters: a serving cluster for low-latency
> (replicating to backup cluster) and then a cluster for MR jobs, dev
> clusters, etc.  Generally these don't share hdfs though again
> cluster's with like SLAs could.
> > If we used just one hadoop cluster then we can have a different paths /
> user
> > for each hbase instance, and then have a set of zookeeper nodes for each
> > instance (or run multiple zk's on each server binding to different hosts
> for
> > each instance..).
> You could do that.  Have all share same zk ensemble (Run one per
> datacenter?)
> > If we used multiple hadoop clusters then the only difference would be
> just
> > using different hdfs for storing the data.
> >
> > Does anyone have experiences with problems or benefits to either of the
> > above?
> >
> > I'm tempted to go towards the single cluster for more efficient use of
> > hardware but I'm not sure if that's a good idea or not.
> >
> At SU the cluster serving the frontend is distinct from the cluster
> running the heavy-duty MR jobs. When a big MR job started up, the
> front-end latency tended to suffer.  There might be some ratio of HDFS
> nodes to HBase nodes that would make it so low-latency and MR cluster
> could share HDFS but I've not done the work to figure it.
> St.Ack

Dan Harvey | Datamining Engineer

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message