hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3607) Cursor functionality for results generated by Coprocessors
Date Tue, 15 Mar 2011 06:00:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006804#comment-13006804

Himanshu Vashishtha commented on HBASE-3607:

First, thanks for reviewing it Stack.
Sorry for not making its requirements "very" clear in the description. 

You asked: What is CursorCallable adding over and above Scanner? Its not clear to me (Pardon

A scanner is to read the raw ("virgin") rows of the table, and one can add filters etc to
do the sieving. A cursor is to traverse a computed resultset, that is a result of some CP
This is useful in cases when instead of getting one value as the post computation result at
region level (like the agg functions), the resultset is bunch of rows. This cursor thing provides
a mechanism to consume this computed resultset (by sending it to the client in a piece wise
manner), and if necessary asking the CP to produce more of the result. Therefore, it supports
two types of ResultSets: Incremental and InMemory.
Incremental: In this case, results can be generated on a per row (or a group of rows) basis.
For example, the test case used in the patch. If a client says give me 100 rows in one rpc,
the corresponding cursor object will give exactly that much number of rows in the next call.
InMemory: This is like computing top K rows in one region. Here, the resultset _has_ to be
precomputed before the cursor object is instantiated and the handle is given to the client.
Once the result set is created, a cursor object is created. Invoking next() like methods will
only consume the resultset (as it is already computed on the entire region.

Hope this clarification will be useful.

yes, in the current patch, its fail fast in case of a region split (just abandons the process
and leave it to the client to re-submit the request).

> Cursor functionality for results generated by Coprocessors
> ----------------------------------------------------------
>                 Key: HBASE-3607
>                 URL: https://issues.apache.org/jira/browse/HBASE-3607
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Himanshu Vashishtha
>         Attachments: patch-2.txt
> I tried to come up with a scanner like functionality for results generated by coprocessors
at region level. 
> This is just a poc, and it will be good to have your comments on it.
> It has support for both Incremental and In-memory Result sets. Attached is a patch that
has a test case for an incremental result (i.e., client receives a cursorId from the CP core
method, it instantiates a cursor object and iterates over the result set. He can set a cache
limit on the CursorCallable object to reduce the number of rpc --> just like scanners.
> In its current state, it has some limitations too :)), like, it is region specific only,
i.e., one can instantiate and use cursor at one region only (and that region is determined
by the input row while instantiating the cursor). I will try to expand it so that it can have
atleast a sequential access to other regions, but as I said, I want the opinion of experts
to know whether this approach really makes some sense or not.
> I have tested it with the inbuilt testing framework on my laptop only.
> It will be good if I copy the use case here in the description too:
> Test table has rows like:
>  /**
>    * The scenario is that I have these rows keys in the test table:
>   'aaa-123'
>   'aaa-456'
>   'abc-111'
>   'abd-111'
>   'abd-222'
>   & I want to return:
>   ('aaa', 2)
>   ('abc', 1)
>   ('abd', 2)

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message