hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Fanning out hbase queries in parallel
Date Mon, 25 Jul 2011 14:37:05 GMT
Which release(s) have coprocessors enabled?

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jul 24, 2011, at 11:03 PM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:

> Hi Paul,
> Have you taken a look at HBase coprocessors? I think you will find them
> useful.
> Best Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
> Integration<https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
> <http://in.linkedin.com/in/sonalgoyal>
> On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <paul.nickerson@escapemg.com
>> wrote:
>> I would like to implement a multidimensional query system that aggregates
>> large amounts of data on-the-fly by fanning out queries in parallel. It
>> should be fast enough for interactive exploration of the data and extensible
>> enough to take sets of hundreds or thousands of dimensions with high
>> cardinality, and aggregate them from high granularity to low granularity.
>> Dimensions and their values are stored in the row key. For instance, row
>> keys look like this
>> Foo=bar,blah=123
>> and each row contains numerical values within their column families, such
>> as plays=100, versioned by the date of calculation.
>> User wants the top "Foo" values with blah=123 sorted downward by total
>> plays in july. My current thinking is that a query would get executed by
>> grouping all Foo-prefixed row keys by region server, and send the query to
>> each of those. Each region server iterates through all of it's row keys that
>> start with Foo=something,blah=, and passes the query on to all regions
>> containing blahs that equal 123, which then contain play counts. Matching
>> row keys, as well as the sum of all their play values within july, are
>> passed back up the chain and sorted/truncated when possible.
>> It seems quite complicated and would involve either modifying hbase source
>> code or at the very least using the deep internals of the api. Does this
>> seem like a practical solution or could someone offer some ideas?
>> Thank you!

View raw message