hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: High throughput input, low latency output?
Date Sat, 08 Oct 2011 03:58:59 GMT
On Fri, Oct 7, 2011 at 12:43 PM, Anthony Urso <anthonyu@cs.ucla.edu> wrote:
> We have a use case that will require a ten to twenty EC2 node HBase
> cluster to take several hundred million rows of input from a larger
> number of EMR instances in daily bursts, and then serve those rows via
> low latency random reads, say on the order of 300 or so rows per
> second. Before we start coding, I thought it best to ask the experts
> for their advice.
> 1) Is this something that HBase will be able to handle gracefully?

You might have some chance if you were not on EC2.

Any chance of caching working?  Are the reads totally random or will
there be 'hot' areas?  If so, you might have some hope.

> 2) Does anyone have any pointers on how to tune HBase for performance
> and stability under this load?

See performance section on book up on hbase.org (though there should
probably be EC2 caveats...)

> 3) Would HBase perform better under this sort of load on twelve large
> EC2 instances, six xlarge or three xxlarge?

The more nodes the better.  And if those nodes are not virtualized,
better still.  But then there is the network and if its saturated....

Can you run some tests before you start coding?

View raw message