hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/SecondaryIndexing" by Eugene Koontz
Date Mon, 11 Apr 2011 17:42:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/SecondaryIndexing" page has been changed by Eugene Koontz.
The comment on this change is: fix item lists.
http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing?action=diff&rev1=4&rev2=5

--------------------------------------------------

  = HBase Secondary Indexing =
- 
  This is a design document around different approaches to secondary indexing in HBase.
  
  == Eventually Consistent Secondary Indexes using Coprocessors ==
- 
  The basic idea is to use an additional (secondary) table for each index on the main (primary)
table.  A coprocessor binding to a family would be used to define a given secondary index
on that family (or specific column(s) within it).  The WAL would be used to ensure durability
and a shared queue makes the secondary update async from the callers POV.  Normal HBase timestamps
would be used for any conflict resolution and to make operations idempotent.
  
  When a Put comes in to the primary table, the following would happen (assuming a single
index update to a single secondary table for now):
@@ -22, +20 @@

  
  6. Return to client
  
- 
  The shared queue would be a thread or threadpool that picks up these secondary table edit
jobs and applies them using a normal Put operation to the secondary table.
  
  On failover of primary table, primary edits would be replayed normally, and secondary edits
would be applied to the secondary table/server as is done with the shared queue.
@@ -37, +34 @@

  
  Or we could tie secondary edits to each memstore, and the flushing of a memstore can only
happen if its secondary edits have all been applied, which would tie in with the existing
semantics around log eviction... but that has other implications and won't really help with
preventing too much over replay.
  
- 
  Other open questions:
  
- * Creation of secondary tables (auto-bootstrapped?  part of coprocessor init?  manual?)
+  * Creation of secondary tables (auto-bootstrapped?  part of coprocessor init?  manual?)

- * Read API
+  * Read API
  
  Future work:
  
- * Declaration of indexes via API or shell syntax rather than programatically with a coprocessor-per-index
+  * Declaration of indexes via API or shell syntax rather than programatically with a coprocessor-per-index
- * Creation of indexes on existing tables (build of indexes based on current data and kept
up to date)
+  * Creation of indexes on existing tables (build of indexes based on current data and kept
up to date)
- * Option to apply secondary update in a synchronous fashion (if you want to take performance
hit and have stronger consistency of the index)
+  * Option to apply secondary update in a synchronous fashion (if you want to take performance
hit and have stronger consistency of the index)
- * Storing of primary table data in secondary table to provide single-lookup denormalized
join
+  * Storing of primary table data in secondary table to provide single-lookup denormalized
join
  
  == Secondary Indexes using Optimistic Concurrency Control ==
  
@@ -56, +52 @@

  
  Currently this lives here:  https://github.com/hbase-trx/hbase-transactional-tableindexed
  
- 
  == In-memory Secondary Indexes for Indexed Scans ==
- 
  This was implemented once but I'm not sure where it lives anymore.
  

Mime
View raw message