hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Baranau (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2038) Coprocessors: Region level indexing
Date Sun, 21 Nov 2010 18:00:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934308#action_12934308
] 

Alex Baranau commented on HBASE-2038:
-------------------------------------

Hello,

As first cut of Coprocessors (CP) implementation has been committed to trunk (HBASE-2001 and
HBASE-2002) I think there's a good opportunity to get going with this issue. I believe it's
a good time for this effort and hope that CP-based implementation of region-level indexing
will confirm that CP API is complete and has all one might need (for now).

I revised the design/approach of the IHBase contrib and have several questions to ask with
regard to transforming the code based on CPs. It would be great if someone can help me with
them!

1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created
and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple
scanners can be handled at the same time for the regions). Otherwise, if coprocessors are
meant to be stateless, I believe that CoprocessorEnvironment's get/put/remove methods are
used to store intermediate data (aka attributes) between method calls (if we really need it).
Is CoprocessorEnvironment instance is created one-per-region? I know, e.g. I can store some
scan-related data using scanId passed to the scan-related callbacks (is it safe?), but what
about region-related data (no problem with it in case cp env is one-per-region)?
In general, do I understand the CP's API correctly (based on assumptions I share in this point)?

2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase
versions, and hence current IHBase implementation doesn't take it into account) we need to
return multiple rows from scan's next() method. It looks like if we apply current approach
(from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward
scan to the *first* value of those to return. Others will be fetched using "usual" scan logic
without using index which isn't efficient. There's not a lot we can do without changing scan
(and deeper) code. Am I right here? Perhaps it's ok to have a lack of support for batch reads
for the first version of CP-based IHBase? Or, it might me that we should change the approach?

3) Is it in general a good idea to take this initiave (transform IHBase implementation to
CP-based one) by me? I fear that it might be that due to a lot of changes in HBase codebase
(trunk versus e.g. 0.20.5) there are going to be severe changes in approach/design of indices
implementation (from the current one, which I could use as a base), so poking you guys (HBase
devs) from my side *a lot* (if really needed) to learn things about it isn't very efficient
way to work on this issue :)? Anyways, I'd be glad to work on the issue if someone can provide
needed guidance.

4) Haven't dug into THBase contrib (as in IHBase). Are these contribs (IHBase and THBase)
will be "transferred" to CP-based implementation as a single effort? I believe they won't
be merged based on how differently they act now. Was it really meant to put the tasks for
*both* into single JIRA issue?

Thank you!

> Coprocessors: Region level indexing
> -----------------------------------
>
>                 Key: HBASE-2038
>                 URL: https://issues.apache.org/jira/browse/HBASE-2038
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Priority: Minor
>
> HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost
for coprocessor environment design -- there should be enough of it so region level indexing
can be reimplemented as a coprocessor without any loss of functionality. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message