hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6805) Extend co-processor framework to provide observers for filter operations
Date Tue, 02 Oct 2012 21:25:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468086#comment-13468086
] 

Gary Helmling commented on HBASE-6805:
--------------------------------------

Looking at the encryption example here, it seems like you could provide that with the existing
coprocessor hooks:

Option A:
# In {{EncryptingRegionObserver.preScannerOpen()}}, if any Filter is set on the Scan object,
wrap it in a custom {{DecryptingFilterWrapper}}.  This would just decrypt the KVs before passing
them on to the client provided Filter, essentially doing the same work your example preFilterXXX
methods are doing.
# In {{EncryptingRegionObserver.postScannerNext()}}, again decrypt the final KVs being returned
to the client, same as your example.

The duplicate decryption here seems unnecessary, but it should give you the same results as
your provided example, without the need to add a batch of pre/postFilterXXX hooks to RegionObservers.

Option B:
# In {{EncryptingRegionObserver.preStoreScannerOpen()}} return a custom KeyValueScanner implementation
that extends or wraps the default StoreScanner implementation.  Note that this would still
be a little tricky since filters are applied down in ScanQueryMatcher.  For decryption what
you would really want is to hook in above the StoreFileScanners and MemStoreScanners used
internally by StoreScanner, but below the ScanQueryMatcher operations, so that you can decrypt
each KV once as it's read.  Seems like that would currently require duplicating a fair amount
of StoreScanner functionality.  Maybe something needs to be added to better hook in to this
data reading layer?

The main issue I see is that the added hooks fuzz the line between Filters and RegionObservers
and their areas of responsibility.  It doesn't seem like we should really need pre/postFilterXXX
hooks, because that's what filters are supposed to provide.  And of course adding more Observer
hooks does have a cost in increasing complexity of the coprocessor interfaces and added overhead
(especially in hot code paths).

Are there really cases that require the pre/postFilter hooks that can't be accomplished by
having a RegionObserver wrap gets/scans with it's own Filter implementation that coordinates
with the RegionObserver instance?
                
> Extend co-processor framework to provide observers for filter operations
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6805
>                 URL: https://issues.apache.org/jira/browse/HBASE-6805
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Coprocessors
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: extend_coprocessor.patch
>
>
> There are several filter operations (e.g., filterKeyValue, filterRow, transform, etc.)
at the region server side that either exclude KVs from the returned results, or transform
the returned KV. We need to provide observers (e.g., preFilterKeyValue and postFilterKeyValue)
for these operations in the same way as the observers for other data access operations (e.g.,
preGet and postGet). This extension is needed to support DOT (e.g., extracting individual
fields from the document in the observers before passing them to the related filter operations)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message