hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Any successful story of an HBasecell for 'analytics job' plus 'realtime serving'?
Date Sun, 04 Jul 2010 10:49:54 GMT
> From: Sean Bigdatafun
> What I am thinking of is the following scenario:
> -- 1) I want to store my hourly web traffic into a fact
> table hourly into Table A
> -- 2) I want to invoke map-reduce to generate aggregated
> table like trends/web-usage-summary into Table B
> -- 3) I want to serve end user's query from Table B.

I have successfully done something like this in the past, on an experimental cluster. You
must adjust the size the cluster from time to time (we started with 15, went to 25) and spend
time via trial and error with MapReduce tasktracker and job spec tuning to insure the scanning
query load on the cluster does not cause user query latency to fall out of tolerance. 

There has been some recent talk about introducing QoS into HBase RPC: https://issues.apache.org/jira/browse/HBASE-2782.
Proposed is a narrowly scoped issue regarding META. But I could see a RPC QoS scheme with
the adjustable priorities: 

     META (highest)

which would minimize as a rule single row query latency at the expense of all but system operation,
if that is your choice, and possibly do it well enough so you don't need to tune beyond setting
the RPC QoS priorities. 

Regarding HBase RPC in general, we are going to need to think about supporting security features
in Hadoop and dynamic runtime method extension for coprocessors. Causes one to give HBASE-2182
a hard look (https://issues.apache.org/jira/browse/HBASE-2182)

  - Andy


View raw message