hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: EndPoint Coprocessor could be dealocked?
Date Thu, 17 May 2012 17:39:57 GMT
> You should not let just any user run coprocessors on the server. That's madness.
> Best regards,
>    - Andy

Fei Ding, 

I'm a little confused. 
Are you trying to solve the problem of querying  data efficiently from a table, or are you
trying to find an example of where and when  to use co-processors?

You actually have an interesting problem that isn't easily solved in relational databases,
but I don't think its an appropriate problem if you want to stress the use of coprocessors.

Yes with Indexes you want to use coprocessors as a way to keep the index in synch with the
underlying table. 

However beyond that... the solution is really best run as a M/R job. 

Considering that HBase has two different access methods. One is as part of M/R jobs, the other
is a client/server model.  If you wanted to, you could create a service/engine/app that would
allow you to efficiently query and return result sets from your database, as well as manage
In part, coprocessors make this a lot easier. 

If you consider the general flow of my solution earlier in this thread, you now have a really
great way to implement this.

Note: we're really talking about allowing someone to query data from a table using multiple
indexes and index types. Think alternate table (key/value pair) , Lucene/SOLR, and GeoSpatial.

You could even bench mark it against an Oracle implementation, and probably smoke it.
You could also do efficient joins between tables. 

So yeah, I would encourage you to work on your initial problem... ;-)

Just Saying...  ;-)


On May 16, 2012, at 8:49 PM, Andrew Purtell wrote:

> On Wed, May 16, 2012 at 6:43 PM, fding hbase <fding.hbase@gmail.com> wrote:
>>> Not coprocessors in general. The client side support for Endpoints
>>> (Exec, etc.) gives the developer the fiction of addressing the cluster
>>> as a range of rows, and will parallelize per-region Endpoint
>>> invocations, and collect the responses, and can return them all to the
>>> caller as "a single call".
>> But on the deadlock problem the Endpoint behaves the same way as Observer.
>> Endpoints are also executed via RPC handlers of RegionServer.
> Reread what I wrote. I'm not talking about the server side above.
> Regarding the RPC issues, yes the behavior is the same. My other point
> was there is no RPC deadlock if you schedule your additional work
> (which issues RPCs) in some background thread or Executor and return
> to the client immediately. But that is not what you have claimed you
> want to do, you want to do some distributed indexed join if I
> understood it correctly *first* (via RPC) and *then* return to the
> client. That is how you would get deadlocks.
>> the coprocessors are written by users and any kind of
>> code may appear on the server side.
> You should not let just any user run coprocessors on the server. That's madness.
> Best regards,
>    - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)

View raw message