hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fding hbase <fding.hb...@gmail.com>
Subject Re: EndPoint Coprocessor could be dealocked?
Date Mon, 14 May 2012 13:20:48 GMT
Hi Michel,

I indexed each column within a column family of a table, so we can query a
row with specific column value.
By multi-index I mean using multiple indexes at the same time on a single
query. That looks like a SQL select
with two *where* clauses of two indexed columns.

The row key of index table is made up of column value and row key of
indexed table. For set intersection
I used the utility class from Apache common-collections package
CollectionUtils.intersection(). There's no
assumption on sort order on indices. A scan with column value as startKey
and column value+1 as endKey
applied to index table will return all rows in indexed table with that
column value.

For multi-index queries, previously I tried to use a scan for each index
column and intersect of those
result sets to get the rows that I want. But the query time is too long. So
I decided to move the computation of
intersection to server side and reduce the amount of data transferred.

Do you have any better idea?

On Mon, May 14, 2012 at 8:17 PM, Michel Segel <michael_segel@hotmail.com>wrote:

> Need a little clarification...
>
> You said that you need to do multi-index queries.
>
> Did you mean to say multiple people running queries at the same time, or
> did you mean you wanted to do multi-key indexes where the key is a
> multi-key part.
>
> Or did you mean that you really wanted to use multiple indexes at the same
> time on a single query?
>
> If its the latter, not really a good idea...
> How do you handle the intersection of the two sets? (3 sets or more?)
> Can you assume that the indexes are in sort order?
>
> What happens when the results from the indexes exceed the amount of
> allocated memory?
>
> What I am suggesting you to do is to set aside the underpinnings of HBase
> and look at the problem you are trying to solve in general terms.  Not an
> easy one...
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 14, 2012, at 4:35 AM, fding hbase <fding.hbase@gmail.com> wrote:
>
> > Hi all,
> >
> > Is it possible to use table scanner (different from the host table
> region)
> > or
> > execute coprocessor of another table, in the endpoint coprocessor?
> > It looks like chaining coprocessors. But I found a possible deadlock!
> > Can anyone help me with this?
> >
> > In my testing environment I deployed the 0.92.0 version from CDH.
> > I wrote an Endpoint coprocessor to do composite secondary index queries.
> > The index is stored in another table and the index update is maintained
> > by the client through a extended HTable. While a single index query
> > works fine through Scanners of index table, soon after we realized
> > we need to do multi-index queries at the same time.
> > At first we tried to pull every row keys queried from a single index
> table
> > and do the merge (just set intersection) on the client,
> > but that overruns the network bandwidth. So I proposed to try
> > the endpoint coprocessor. The idea is to use coprocessors, one
> > in master table (the indexed table) and the other for each index table
> > regions.
> > Each master table region coprocessor instance invokes the index table
> > coprocessor instances with its regioninfo (the startKey and endKey) and
> the
> > scan,
> > the index table region coprocessor instance scans and returns the row
> keys
> > within the range of startKey and endKey passed in.
> >
> > The cluster blocks sometimes in invoking the index table coprocessor. I
> > traced
> > into the code and found that when HConnection locates regions it will rpc
> > to the same regionserver.
> >
> > (After a while I found the index table coprocessor is equivalent to
> > just a plain scan with filter, so I switched to scanners with filter, but
> > the problem
> > remains.)
>



-- 

Best Regards!

Fei Ding
fding.church@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message