hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Urso <antho...@cs.ucla.edu>
Subject Re: High throughput input, low latency output?
Date Sat, 08 Oct 2011 19:18:24 GMT
On Fri, Oct 7, 2011 at 8:58 PM, Stack <stack@duboce.net> wrote:
> On Fri, Oct 7, 2011 at 12:43 PM, Anthony Urso <anthonyu@cs.ucla.edu> wrote:
>> We have a use case that will require a ten to twenty EC2 node HBase
>> cluster to take several hundred million rows of input from a larger
>> number of EMR instances in daily bursts, and then serve those rows via
>> low latency random reads, say on the order of 300 or so rows per
>> second. Before we start coding, I thought it best to ask the experts
>> for their advice.
>> 1) Is this something that HBase will be able to handle gracefully?
> You might have some chance if you were not on EC2.

Is that because of the slow disk I/O?

> Any chance of caching working?  Are the reads totally random or will
> there be 'hot' areas?  If so, you might have some hope.

Hopefully.  Do you mean external caching like memcache or OS-level disk caching?

>> 2) Does anyone have any pointers on how to tune HBase for performance
>> and stability under this load?
> See performance section on book up on hbase.org (though there should
> probably be EC2 caveats...)


>> 3) Would HBase perform better under this sort of load on twelve large
>> EC2 instances, six xlarge or three xxlarge?
> The more nodes the better.  And if those nodes are not virtualized,
> better still.  But then there is the network and if its saturated....
> Can you run some tests before you start coding?

Good idea.

> St.Ack

View raw message