hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2037) Alternate indexed hbase implementation; speeds scans by adding indexes to regions rather secondary tables
Date Tue, 05 Jan 2010 07:14:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796567#action_12796567

stack commented on HBASE-2037:

All contrib tests including all of these new ones are also passing and the above core test
failure seems like a transient failure:

Testsuite: org.apache.hadoop.hbase.regionserver.TestGetDeleteTracker
Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.056 sec
------------- Standard Output ---------------
Qf col, timestamp, 1262675527734260000, type Delete
Qf col, timestamp, 1262675527734259000, type DeleteColumn
Qf col2, timestamp, 1262675527734259000, type Delete
------------- ---------------- ---------------

Testcase: testUpdate_CompareDeletes took 0.003 sec
Testcase: testUpdate took 0.002 sec
Testcase: testIsDeleted_NotDeleted took 0 sec
Testcase: testIsDeleted_Delete took 0 sec
Testcase: testIsDeleted_DeleteColumn took 0 sec
Testcase: testIsDeleted_DeleteFamily took 0 sec
Testcase: testStackOverflow took 0.037 sec

> Alternate indexed hbase implementation; speeds scans by adding indexes to regions rather
secondary tables
> ---------------------------------------------------------------------------------------------------------
>                 Key: HBASE-2037
>                 URL: https://issues.apache.org/jira/browse/HBASE-2037
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>             Fix For: 0.20.3
>         Attachments: idx-hbase2.patch, idx-hbase3.patch
> Purpose
> The goal of the indexed HBase contrib is to speed up scans by indexing HBase columns.
Indexed HBase (IHbase) is different from the indexed tables in transactional HBase (ITHbase):
while the indexes in ITHBase are, in fact, hbase tables using the indexed column's values
as row keys, IHbase creates indexes at the region level. The differences are summarized in
> + global ordering
> ITHBase: yes
> IHBase: no
> Comment: IHBase has an index for each region. The flip side of not having global ordering
is compatibility with the good old HRegion: results are coming back in row order (and not
value order as in THBase)
> + Full table scan?
> ITHBase: no
> IHBase: no
> Comment: ITHbase does a partial scan on the index table. IHbase supports specifying start/end
rows to limit the number of scanned regions
> + Multiple Index Usage
> ITHBase: no
> IHBase: yes
> Comment: IHBase can take advantage of multiple indexes in the same scan. IHBase IdxScan
object accepts an Expression which allows intersection/ unison of several indexed 
> column criteria
> + Extra disk storage
> ITHBase: yes
> IHBase: no
> Comment: IHbase indexes are created when the region starts/flushes and do not require
any extra storage
> + Extra RAM
> ITHBase: yes
> IHBase: yes
> Comment: IHbase indexes are in memory and hence increase the memory overhead. THbase
indexes increase the number of regions each region server has to support thus costing memory
> + Parallel scanning support
> ITHBase: no
> IHBase: yes
> In ITHbase the index table needs to be consulted and then GETs are issued for each matching
row. The behavior of IHBase (as perceived by the client) is no different than a regular scan
and hence supports parallel scanning seamlessly. parallel GET can be implemented to speedup
ITHbase scans
> Why IHbase should outperform ITHBase
> 1. More flexible: a. Supports range queries and multi-index queries b. Supports different
types - not only byte arrays
> 2. Less overhead: ITHbase pays at least two 'table roundtrips' - one for the index table
and the other for the main table
> 3. Quicker index expression evaluation: IHBase is using dedicated index data structures
while ITHbase is using the regular HRegion scan facilities
> Implementation notes
> • Only index Storefiles.Every index scan performs a full memstore scan. Indexing the
memstore will be implemented only if scanning the memstore will prove to be a performance
> • Index expression evaluation is performed using bit sets.There are two types of bitsets:
compressed and expanded. An index will typically store a compressed bitset while an expression
evaluator will most probably use an expanded bitset
> + TODO
> This patch changes some some of hbase core so can instantiate other than default HRegion.
 Fixes bugs in filter too.
> Would like to add this as a contrib. package on 0.20 branch in time for 0.20.3 if possible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message