hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2002) Coprocessors: Client side support
Date Thu, 07 Oct 2010 19:38:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919037#action_12919037
] 

HBase Review Board commented on HBASE-2002:
-------------------------------------------

Message from: "Gary Helmling" <ghelmling@gmail.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/816/
-----------------------------------------------------------

(Updated 2010-10-07 12:27:43.994582)


Review request for hbase, stack, Andrew Purtell, and Jonathan Gray.


Changes
-------

Remove unnecessary List to array conversion and copy in refactored HConnectionMananger.HConnectionImplementation.processBatch()
and fix retry-mishandling in refactored code.


Summary
-------

This is really two separate patches in one, though with some overlapping changes.  If necessary
I can split them apart for separate review.  Please let me know if that would make review
easier.

Part 1:
==============
Port over of HADOOP-6422 to the HBase RPC code.  The goal of this change is to allow alternate
RPC client/server implementations to be enabled through a simple configuration change.  Ultimately
I would like to use this to allow secure RPC to be enabled through configuration, while not
blocking normal (current) RPC operation on non-secure Hadoop versions.

This portion of the patch abstracts out two interfaces from the RPC code:

RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and server instances
for HMaster and HRegionServer
RpcServer: this allows differing RPC server implementations, breaking the dependency on HBaseServer

The bulk of the current code from HBaseRPC is moved into WritableRpcEngine and is unchanged
other than the interface requirements.  So the current call path remains the same, other than
the HBaseRPC.getProtocolEngine() abstraction.


Part 2:
===============
The remaining changes provide server-side hooks for registering new RPC protocols/handlers
(per-region to support coprocessors), and client side hooks to support dynamic execution of
the registered protocols.

The new RPC protocol actions are constrained to org.apache.hadoop.hbase.ipc.CoprocessorProtocol
implementations (which extends VersionedProtocol) to prevent arbitrary execution of methods
against HMasterInterface, HRegionInterface, etc.

For protocol handler registration, HRegionServer provides a new method:

  public <T extends CoprocessorProtocol> boolean registerProtocol(
      byte[] region, Class<T> protocol, T handler)

which builds a Map of region name to protocol instances for dispatching client calls.


Client invocations are performed through HTable, which adds the following methods:


  public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row)

This directly returns a proxy instance to the CoprocessorProtocol implementation registered
for the region serving row "row".  Any method calls will be proxied to the region's server
and invoked using the map of registered region name -> handler instances.

Calls directed against multiple rows are a bit more complicated.  They are supported with
the methods:

  public <T extends CoprocessorProtocol, R> void exec(
      Class<T> protocol, List<? extends Row> rows,
      BatchCall<T,R> callable, BatchCallback<R> callback)

  public <T extends CoprocessorProtocol, R> void exec(
      Class<T> protocol, RowRange range,
      BatchCall<T,R> callable, BatchCallback<R> callback)

where BatchCall and BatchCallback are simple interfaces defining the methods to be called
and a callback instance to be invoked for each result.

For the sample CoprocessorProtocol interface:

  interface PingProtocol extends CoprocessorProtocol {
    public String ping();
    public String hello(String name);
  }

a client invocation might look like:

    final Map<byte[],R> results = new TreeMap<byte[],R>(...)
    List<Row> rows = ...
    table.exec(PingProtocol.class, rows,
        new HTable.BatchCall<PingProtocol,String>() {
          public String call(PingProtocol instance) {
            return instance.ping();
          }
        },
        new BatchCallback<R>(){
          public void update(byte[] region, byte[] row, R value) {
            results.put(region, value);
          }
        });

The BatchCall.call() method will be invoked for each row in the passed in list, and the BatchCallback.update()
method will be invoked for each return value.  However, currently the PingProtocol.ping()
invocation will result in a separate RPC call per row, which is less that ideal.

Support is in place to make use of the HRegionServer.multi() invocations for batched RPC (see
the org.apache.hadoop.hbase.client.Exec class), but this does not mesh well with the current
client-side interface.

In addition to standard code review, I'd appreciate any thoughts on the client interactions
in particular, and whether they would meet some of the anticipated uses of coprocessors.


This addresses bug HBASE-2002.
    http://issues.apache.org/jira/browse/HBASE-2002


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81 
  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 4ad91c6 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263 
  src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 74593bf 
  src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838 
  src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b 
  src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/coprocessor/ExecResult.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/coprocessor/package-info.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d 
  src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java ee5dd8f 
  src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 6b800e6 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 5f829e4 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 36c404d 
  src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java d4166cf 
  src/main/resources/hbase-default.xml 5fafe65 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java PRE-CREATION


Diff: http://review.cloudera.org/r/816/diff


Testing
-------


Thanks,

Gary




> Coprocessors: Client side support
> ---------------------------------
>
>                 Key: HBASE-2002
>                 URL: https://issues.apache.org/jira/browse/HBASE-2002
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>
> "High-level call interface for clients. Unlike RPC, calls addressed to rows or ranges
of rows. Coprocessor client library resolves to actual locations. Calls across multiple rows
automatically split into multiple parallelized RPCs"
> Generic multicall RPC facility which incorporates this and multiget/multiput/multidelete
and parallel scanners.
> Group and batch RPCs by region server. Track and retry outstanding RPCs. Ride over region
relocations. 
> Support addressing by explicit region identifier or by row key or row key range. 
> Include a facility for merging results client side. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message