hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/SecondaryIndexing" by jgray
Date Mon, 28 Feb 2011 18:42:59 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/SecondaryIndexing" page has been changed by jgray.
http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing

--------------------------------------------------

New page:
= HBase Secondary Indexing =

This is a design document around different approaches to secondary indexing in HBase.

== Eventually Consistent Secondary Indexes using Coprocessors ==

The basic idea is to use an additional (secondary) table for each index on the main (primary)
table.  A coprocessor binding to a family would be used to define a given secondary index
on that family (or specific column(s) within it).

When a Put comes in to the primary table, the following would happen:

1. Generate WALEdit for primary table
2. Generate a new, special kind of WALEdit for secondary table update
3. 



Open questions:

* How to deal with creation of secondary tables
* 


Future work:

* Declaration of indexes via API or shell syntax rather than programatically with a coprocessor-per-index
* Creation of indexes on existing tables (build of indexes based on current data and kept
up to date)


== Secondary Indexes using Optimistic Concurrency Control ==

These are implemented by Transactional HBase / IndexedTable.

Currently this lives here:  https://github.com/hbase-trx/hbase-transactional-tableindexed


== In-memory Secondary Indexes for Indexed Scans ==

This was implemented once but I'm not sure where it lives anymore.

Mime
View raw message