hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-2000) Coprocessors
Date Sat, 11 Jun 2011 14:13:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell resolved HBASE-2000.

       Resolution: Fixed
    Fix Version/s: 0.92.0
         Assignee:     (was: Andrew Purtell)
     Hadoop Flags: [Reviewed]

> Coprocessors
> ------------
>                 Key: HBASE-2000
>                 URL: https://issues.apache.org/jira/browse/HBASE-2000
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>             Fix For: 0.92.0
> From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009,
slides 66 - 67): 
> BigTable Coprocessors (New Since OSDI'06)
> * Arbitrary code that runs run next to each tablet in table
>     ** As tablets split and move, coprocessor code automatically splits/moves too
> * High-level call interface for clients
>     ** Unlike RPC, calls addressed to rows or ranges of rows
> * coprocessor client library resolves to actual locations
>     ** Calls across multiple rows automatically split into multiple parallelized RPCs
> * Very flexible model for building distributed services
>     ** Automatic scaling, load balancing, request routing for apps
> Example Coprocessor Uses
> * Scalable metadata management for Colossus (next gen GFS-like file system)
> * Distributed language model serving for machine translation system
> * Distributed query processing for full-text indexing support
> * Regular expression search support for code repository
> For HBase, adding a coprocessor framework will allow for pluggable incremental addition
of functionality. No more need to subclass the regionserver interface and implementation classes
and set {{hbase.regionserver.class}} and {{hbase.regionserver.impl}} in hbase-site.xml. That
mechanism allows for extension but at the exclusion of all others. 
> Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers.
Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to
invoke concurrently on all regions of the table. Note this is not MapReduce on the table;
this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very
similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating
(and processing) a lot of intermediates. An initial application of this could be support for
rapid calculation of aggregates over data stored in HBase.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message