hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George P. Stathis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2426) [Transactional Contrib] Introduce quick scanning row-based secondary indexes
Date Thu, 15 Apr 2010 22:06:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857565#action_12857565

George P. Stathis commented on HBASE-2426:


Thank you for the feedback. After more consideration during this past week and after reviewing
your comment, I think that I should retract any speed improvement claims regarding this patch.
I started on this contrib focusing solely on scans but I naively neglected to account for
the column iterations. If one takes everything in to account, I do agree that there should
not be much of a speed difference between what's already there and the RowBasedIndexSpecification.
Maybe if one thinks in terms of IO, fetching a row and then iterating in memory instead of
scanning through files might have an edge, but I'm not quite sure about this either; I'm still
new with this technology stack and I'm not sure if scanning through more rows means going
though more files. Some actual  performance tests should be run to see if that statement even
holds (or someone more knowledgeable like you should set me straight :-) ). 

So, at the very least, the JavaDoc should be amended to reflect this.

As it turns out though, this contrib is definitely useful when used in conjunction with https://issues.apache.org/jira/browse/HBASE-2438.
Since there is currently no reliable way to paginate through rows, a row based indexing approach
can at least guarantee that the pages returned contain the number of rows requested. Our application
does leverage pagination, so we will be able to use this, at least until a reliable row-based
pagination comes along. After that, it may be six and half a dozen. One thing that the new
contrib does not offer over the current solution is the ability to store additional column
values in the index for further filtering. This might be a deal-breaker for some folks.

Let me know what you think. If people don't have any use for this except for column-based
pagination, maybe it's not worth adding.

> [Transactional Contrib] Introduce quick scanning row-based secondary indexes
> ----------------------------------------------------------------------------
>                 Key: HBASE-2426
>                 URL: https://issues.apache.org/jira/browse/HBASE-2426
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: contrib
>            Reporter: George P. Stathis
>            Priority: Minor
>             Fix For: 0.20.5, 0.21.0
>         Attachments: hbase-2426-0.20-branch.patch
> RowBasedIndexSpecification is a specialized IndexSpecification class for creating row-based
secondary index tables. Base table rows with the same indexed column value have their row
keys stored as column qualifiers on the same secondary index table row. The key for that row
is the indexed column value from the base table. This allows to avoid expensive secondary
index table scans and provides faster access for applications such as foreign key indexing
or queries such as "find all table A rows whose familyA:columnB value is X". RowBasedIndexSpecification
indices can be scanned using the API on RowBasedIndexedTable. The metadata for RowBasedIndexSpecification
differ from IndexSpecification in that:
> - Only a single base table column can be indexed per RowBasedIndexSpecification. No additional
columns are put in the index table.
> and 
> - RowBasedIndexKeyGenerator, which constructs the index-row-key from the indexed column
value in the original column, is always used.
> For a simple RowBasedIndexSpecification example, look at the TestRowBasedIndexedTable
unit test in org.apache.hadoop.hbase.client.tableIndexed.
> To enable RowBasedIndexSpecification indexing, modify hbase-site.xml to turn on the
> IndexedRegionServer.  This is done by setting
> - hbase.regionserver.class to org.apache.hadoop.hbase.ipc.IndexedRegionInterface and
> - hbase.regionserver.impl to org.apache.hadoop.hbase.regionserver.tableindexed.RowBasedIndexedRegionServer

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message