hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3342) Server-side Row-level Inverted Index Join via Coprocessors
Date Mon, 13 Dec 2010 19:45:01 GMT
Server-side Row-level Inverted Index Join via Coprocessors

                 Key: HBASE-3342
                 URL: https://issues.apache.org/jira/browse/HBASE-3342
             Project: HBase
          Issue Type: New Feature
            Reporter: Jonathan Gray

A common schema in HBase is to created an inverted index per row (a la inbox search) where
a row is a user/entity, each column is a word, and versions are instances of that word in
documents (values can be empty or could contain additional scoring info like position / count

When querying indexes like this, we may want to do something like:  give me the N most recent
documents that contain the word "foo" (exact word matching) and contain a word that starts
with "bar" (prefix matching).

Currently this join has to be done on the client-side, so we may have to read far more than
N documents for each word to be able to get N documents which match for both words.  This
gets worse as the number of words increase.

We could implement this join on the server-side in a coprocessor.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message