hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: EndPoint Coprocessor could be dealocked?
Date Fri, 18 May 2012 10:40:02 GMT
Fei DIng,

I think you're making the solution harder than it should be. 

To start with, the only think you need to do is use co-processors to keep the indexes in sync
with the underlying table. 

The code called from the co-processor will depend on the type of action and the type of index
you are using. 

Then you need to only focus on how you use the index and then how you implement the intersection
of the result sets. 

One idea I had was to invert the intersection table so that you would have N rows where each
row would contain the result set. Then you fetch one row to get your row keys. 
So if you have 3 indexes where you would want to find the intersection, fetch the row key
value of 3 would yield the intersection, rather than do a scan of the key values and fetch
the intersection count.  (This could work, but you may have issues with very large result
sets. (How many columns can you have? )


The point is that if you place your focus first on the problem and then secondly on the mechanics
you will have an easier time solving the problem. The only catch is that you have to be able
to work in the abstract.

HTH

-Mike

PS. This really is an interesting problem which when solved will help with the evolution of
HBase more as a Database than as a persistent object store. 

On May 17, 2012, at 7:38 PM, fding hbase wrote:

> Hi Michel,
> On Fri, May 18, 2012 at 1:39 AM, Michael Segel <michael_segel@hotmail.com>wrote:
> 
>>> You should not let just any user run coprocessors on the server. That's
>> madness.
>>> 
>>> Best regards,
>>> 
>>>   - Andy
>> 
>> Fei Ding,
>> 
>> I'm a little confused.
>> Are you trying to solve the problem of querying  data efficiently from a
>> table, or are you trying to find an example of where and when  to use
>> co-processors?
>> 
>> 
> I'm trying to solve the problem of querying data efficiently. Coprocessor
> is one of the possible solutions that I've tried.
> 
> 
>> You actually have an interesting problem that isn't easily solved in
>> relational databases, but I don't think its an appropriate problem if you
>> want to stress the use of coprocessors.
>> 
>> Yes with Indexes you want to use coprocessors as a way to keep the index
>> in synch with the underlying table.
>> 
>> However beyond that... the solution is really best run as a M/R job.
>> 
>> Considering that HBase has two different access methods. One is as part of
>> M/R jobs, the other is a client/server model.  If you wanted to, you could
>> create a service/engine/app that would allow you to efficiently query and
>> return result sets from your database, as well as manage indexes.
>> In part, coprocessors make this a lot easier.
>> 
> 
> I'm not using the coprocessors to maintain index tables, but using extended
> client to do this.
> 
> 
>> 
>> If you consider the general flow of my solution earlier in this thread,
>> you now have a really great way to implement this.
>> 
>> Note: we're really talking about allowing someone to query data from a
>> table using multiple indexes and index types. Think alternate table
>> (key/value pair) , Lucene/SOLR, and GeoSpatial.
>> 
>> You could even bench mark it against an Oracle implementation, and
>> probably smoke it.
>> You could also do efficient joins between tables.
>> 
>> So yeah, I would encourage you to work on your initial problem... ;-)
>> 
>> 
> Alternate table is also one of the possible solutions, however, it's not
> that easy too.  I'm still working on it. ;-)
> 
> -- 
> 
> Best Regards!
> 
> Fei Ding
> fding.church@gmail.com


Mime
View raw message