hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Slow Inserts on EC2 Cluster
Date Wed, 01 Sep 2010 17:04:15 GMT
While I completely agree with much of what you're saying, and am usually one of the first to
encourage people to not use virtual machines w/ HBase, I know of several successful deployments
of HBase on EC2.  In most instances there was some pain encountered, but it does work for

I've not seen these specific issues you seem to be running in to (periodically spiking load
but no cpu or iowait).

I'm not sure I know what HBase could do to operate better in these environments.  I'm not
sure I understand exactly what is happening to RS and ZooKeeper when EC2 is being weird. 
You can't talk to ZK because of a networking issue?  Have you dug in to the ZK server logs
to see what's up?

HBase is a highly available service, we need to do heartbeating of some kind, so lose of network
connectivity is a killer.

It could also be that ZK is being starved of IO so that it cannot write to its transaction
log and that is what is slowing it down.


> -----Original Message-----
> From: Matthew LeMieux [mailto:mdl@mlogiciels.com]
> Sent: Wednesday, September 01, 2010 7:25 AM
> To: user@hbase.apache.org
> Subject: Re: Slow Inserts on EC2 Cluster
> I'm starting to find that EC2 is not reliable enough to support HBase.
> I'm running into 2 things that might be related:
> 1) On idle machines that are apparently doing nothing (reports of <3%
> CPU utilization, no I/O wait)  the load is reported as being higher
> than the number of cores.   I don't know if attachments work on the
> mailing list, but I attached a small image anyway to illustrate this
> confusing thing.  (I've been using m1.large and m2.xlarge running CDH3)
> 2) Every once in a while it seems that somebody hits the pause button
> on one of my instances, and while the CPU utilization stays low, the
> load value spikes to a high value.  When this happens the region
> servers decide to close up shop.  It appears to be a problem with
> contacting zookeeper servers (who happen to stay up and running, but
> perhaps somewhat unresponsive when Amazon decides to hit the pause
> button).  I have extended the timeout for contacting zookeeper servers,
> but these events continue to persist.  One such event happened 8 hours
> ago, and I still can't get HBase back up and running.
> I've seen many comments on this list informing users that they are
> using hardware (or virtual machines) that are simply not big enough,
> not fast enough, or don't have enough memory.  I'd like to offer an
> alternative point of view.  Whether or not EC2 will last is uncertain,
> but cloud computing environments will definitely be around for a long
> time.  What would it take to make HBase resilient enough to take
> advantage of those environments?  Based on my experience and comments
> on this list, it seems "HBase in the cloud" is still a rather painful
> proposition.
> -Matthew

View raw message