hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2000) Coprocessors
Date Fri, 10 Jun 2011 22:51:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047639#comment-13047639
] 

stack commented on HBASE-2000:
------------------------------

Andrew, can we close this issue now?

> Coprocessors
> ------------
>
>                 Key: HBASE-2000
>                 URL: https://issues.apache.org/jira/browse/HBASE-2000
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>
> From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009,
slides 66 - 67): 
> BigTable Coprocessors (New Since OSDI'06)
> * Arbitrary code that runs run next to each tablet in table
>     ** As tablets split and move, coprocessor code automatically splits/moves too
> * High-level call interface for clients
>     ** Unlike RPC, calls addressed to rows or ranges of rows
> * coprocessor client library resolves to actual locations
>     ** Calls across multiple rows automatically split into multiple parallelized RPCs
> * Very flexible model for building distributed services
>     ** Automatic scaling, load balancing, request routing for apps
> Example Coprocessor Uses
> * Scalable metadata management for Colossus (next gen GFS-like file system)
> * Distributed language model serving for machine translation system
> * Distributed query processing for full-text indexing support
> * Regular expression search support for code repository
> For HBase, adding a coprocessor framework will allow for pluggable incremental addition
of functionality. No more need to subclass the regionserver interface and implementation classes
and set {{hbase.regionserver.class}} and {{hbase.regionserver.impl}} in hbase-site.xml. That
mechanism allows for extension but at the exclusion of all others. 
> Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers.
Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to
invoke concurrently on all regions of the table. Note this is not MapReduce on the table;
this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very
similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating
(and processing) a lot of intermediates. An initial application of this could be support for
rapid calculation of aggregates over data stored in HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message