Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <1717744375.75441262806616357.JavaMail.jira@brutus.apache.org>
Date: Wed, 6 Jan 2010 19:36:56 +0000 (UTC)
From: "stack (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-2037) Alternate indexed hbase
 implementation; speeds scans by adding indexes to regions rather secondary
 tables
In-Reply-To: <156634733.1260473298295.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-2037?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1279=
7266#action_12797266 ]=20

stack commented on HBASE-2037:
------------------------------

I made hbase-2092 as blocker on 0.20.3.=20

> Alternate indexed hbase implementation; speeds scans by adding indexes to=
 regions rather secondary tables
> -------------------------------------------------------------------------=
--------------------------------
>
>                 Key: HBASE-2037
>                 URL: https://issues.apache.org/jira/browse/HBASE-2037
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>             Fix For: 0.20.3
>
>         Attachments: idx-hbase2.patch, idx-hbase3.patch, index.html
>
>
> Purpose
> The goal of the indexed HBase contrib is to speed up scans by indexing HB=
ase columns. Indexed HBase (IHbase) is different from the indexed tables in=
 transactional HBase (ITHbase): while the indexes in ITHBase are, in fact, =
hbase tables using the indexed column's values as row keys, IHbase creates =
indexes at the region level. The differences are summarized in below.
> + global ordering
> ITHBase: yes
> IHBase: no
> Comment: IHBase has an index for each region. The flip side of not having=
 global ordering is compatibility with the good old HRegion: results are co=
ming back in row order (and not value order as in THBase)
> + Full table scan?
> ITHBase: no
> IHBase: no
> Comment: ITHbase does a partial scan on the index table. IHbase supports =
specifying start/end rows to limit the number of scanned regions
> + Multiple Index Usage
> ITHBase: no
> IHBase: yes
> Comment: IHBase can take advantage of multiple indexes in the same scan. =
IHBase IdxScan object accepts an Expression which allows intersection/ unis=
on of several indexed=20
> column criteria
> + Extra disk storage
> ITHBase: yes
> IHBase: no
> Comment: IHbase indexes are created when the region starts/flushes and do=
 not require any extra storage
> + Extra RAM
> ITHBase: yes
> IHBase: yes
> Comment: IHbase indexes are in memory and hence increase the memory overh=
ead. THbase indexes increase the number of regions each region server has t=
o support thus costing memory too
> + Parallel scanning support
> ITHBase: no
> IHBase: yes
> In ITHbase the index table needs to be consulted and then GETs are issued=
 for each matching row. The behavior of IHBase (as perceived by the client)=
 is no different than a regular scan and hence supports parallel scanning s=
eamlessly. parallel GET can be implemented to speedup ITHbase scans
> Why IHbase should outperform ITHBase
> 1. More flexible: a. Supports range queries and multi-index queries b. Su=
pports different types - not only byte arrays
> 2. Less overhead: ITHbase pays at least two 'table roundtrips' - one for =
the index table and the other for the main table
> 3. Quicker index expression evaluation: IHBase is using dedicated index d=
ata structures while ITHbase is using the regular HRegion scan facilities
> Implementation notes
> =E2=80=A2 Only index Storefiles.Every index scan performs a full memstore=
 scan. Indexing the memstore will be implemented only if scanning the memst=
ore will prove to be a performance bottleneck
> =E2=80=A2 Index expression evaluation is performed using bit sets.There a=
re two types of bitsets: compressed and expanded. An index will typically s=
tore a compressed bitset while an expression evaluator will most probably u=
se an expanded bitset
> + TODO
> This patch changes some some of hbase core so can instantiate other than =
default HRegion.  Fixes bugs in filter too.
> Would like to add this as a contrib. package on 0.20 branch in time for 0=
.20.3 if possible.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.