hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Nickerson <paul.nicker...@escapemg.com>
Subject Re: Fanning out hbase queries in parallel
Date Mon, 25 Jul 2011 04:45:51 GMT
This looks to be exactly what I need. Thanks :) 

----- Original Message -----

From: "Sonal Goyal" <sonalgoyal4@gmail.com> 
To: user@hbase.apache.org 
Sent: Monday, July 25, 2011 12:03:30 AM 
Subject: Re: Fanning out hbase queries in parallel 

Hi Paul, 

Have you taken a look at HBase coprocessors? I think you will find them 
useful. 

Best Regards, 
Sonal 
<https://github.com/sonalgoyal/hiho>Hadoop ETL and Data 
Integration<https://github.com/sonalgoyal/hiho> 
Nube Technologies <http://www.nubetech.co> 

<http://in.linkedin.com/in/sonalgoyal> 





On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <paul.nickerson@escapemg.com 
> wrote: 

> 
> I would like to implement a multidimensional query system that aggregates 
> large amounts of data on-the-fly by fanning out queries in parallel. It 
> should be fast enough for interactive exploration of the data and extensible 
> enough to take sets of hundreds or thousands of dimensions with high 
> cardinality, and aggregate them from high granularity to low granularity. 
> Dimensions and their values are stored in the row key. For instance, row 
> keys look like this 
> Foo=bar,blah=123 
> and each row contains numerical values within their column families, such 
> as plays=100, versioned by the date of calculation. 
> User wants the top "Foo" values with blah=123 sorted downward by total 
> plays in july. My current thinking is that a query would get executed by 
> grouping all Foo-prefixed row keys by region server, and send the query to 
> each of those. Each region server iterates through all of it's row keys that 
> start with Foo=something,blah=, and passes the query on to all regions 
> containing blahs that equal 123, which then contain play counts. Matching 
> row keys, as well as the sum of all their play values within july, are 
> passed back up the chain and sorted/truncated when possible. 
> 
> 
> It seems quite complicated and would involve either modifying hbase source 
> code or at the very least using the deep internals of the api. Does this 
> seem like a practical solution or could someone offer some ideas? 
> 
> 
> Thank you! 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message