hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling" <ghelml...@gmail.com>
Subject Re: Review Request: [HBASE-2321] [HBASE-2002] Add support for per-region dynamically registered RPC endpoints for coprocessors and allow configurable RPC client/server implementations
Date Fri, 17 Sep 2010 19:00:55 GMT

> On 2010-09-14 21:08:10, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1112
> > <http://review.cloudera.org/r/816/diff/2/?file=11433#file11433line1112>
> >
> >     Has white space on end of line
> >     
> >     You want to catch these IEs?  We are moving toward letting them out or at least
remarking the Thread w/ the interrupted flag (thread.interrupt).

Ok, I'll marked interrupted flag and pass back up an IOE.

> On 2010-09-14 21:08:10, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1296
> > <http://review.cloudera.org/r/816/diff/2/?file=11434#file11434line1296>
> >
> >     What if the remote region does not have the configured protocol?  Can that happen?
 How does remote region get freighted w/ the coprocessor?

In this case an IOE gets thrown indicating that the requested protocol was not found.  I could
add an UnknownProtocolException and include the requested protocol class if that seems worth
it.  In any case, seems like I should make sure this is a DoNotRetryIOException to avoid spinning
our wheels if the protocol handler isn't there.

> On 2010-09-14 21:08:10, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/ExecResult.java, line 51
> > <http://review.cloudera.org/r/816/diff/2/?file=11431#file11431line51>
> >
> >     What will a value be?  Anythign?

Yes, the idea is to support arbitrary methods with whatever return types.  I made this Object
to allow easier expression of standard Java types the HbaseObjectWritable.writeObject() can
already handle.  Seemed a little simpler for the implementer.  

On the other hand, as Andy commented, making this Writable would make the requirements clearer,
but would make handling basic types more cumbersome and just shift the unwrapping/casting
to the client (like calling HbaseObjectWritable.get() to get back an int value).

> On 2010-09-14 21:08:10, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/Action.java, line 30
> > <http://review.cloudera.org/r/816/diff/2/?file=11429#file11429line30>
> >
> >     On your overarching comment:
> >     
> >     + One patch is fine.
> >     + Should the key be the encoded region name rather than regionname for map of
regions to protocols since the latter can be fat?
> >     
> >     I think I need some more illustration of how the calls will be made.  Seems
super awkward at moment.  Keep using the ping coprocessor as example.  Its basic.  Basic is
good for me.
> >     
> >     What you thinking for the batch calls? When you say "but this does not mesh
well with the current client-side interface."... whats this mean?  You want to change the

Encoded region name makes sense.  I can change in Action and related for some savings.  For
the HTable.exec() methods returning Maps, encoded vs. regionname probably doesn't make much
difference.  I could change those as well, but if anything, it might be better to key those
on HRegionInfo as long as it doesn't incur any extra lookups.  Will take a look at that.

I'll expand on the ping examples a bit (or come up with something else), but we'll also have
a more realistic example when hooked in to coprocessors at the region level.

For the batch calls, it would be easy to add something like the following to HTable:

  public Map<byte[],Object> exec(List<Exec>)

using the HRegionServer.multi() support that I already modified.  But I think that winds up
being awkward from the client standpoint as you'll have to construct Exec instances with string
method names or using reflection.

Directly exposing the dynamic proxy up to the user/client level seems like the most expressive
way to do support arbitrary CoprocessorProtocol interfaces.  BatchCall accomplishes this,
thought the Java anonymous class mechanism is of course a bit clunky.

The dynamic proxy used with BatchCall provides an easy way to capture the invoked methods
and arguments, but combined with the per-region invocation doesn't make it easy to batch up
what are essentially identical calls for all the regions we know will be impacted.  There's
a bit of a chicken and egg problem in that we don't know what's being invoked until the proxy
methods are called (per-region), but we do have overall region set as well to be able to do
a multi-call.  Seems like we should be able to catch the first region's invocation and do
a single batch RPC for all the regions and share the results between proxies.  I'm just trying
to figure out the least crappy way of doing so. :)

So by "current client-side interface", I just mean the exec(..., BatchCall) interface I've
exposed, which supports arbitrary calls easily, but makes the RPC efficiency a little more

- Gary

This is an automatically generated e-mail. To reply, visit:

On 2010-09-10 17:29:57, Gary Helmling wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/816/
> -----------------------------------------------------------
> (Updated 2010-09-10 17:29:57)
> Review request for hbase, Andrew Purtell and Jonathan Gray.
> Summary
> -------
> This is really two separate patches in one, though with some overlapping changes.  If
necessary I can split them apart for separate review.  Please let me know if that would make
review easier.
> Part 1:
> ==============
> Port over of HADOOP-6422 to the HBase RPC code.  The goal of this change is to allow
alternate RPC client/server implementations to be enabled through a simple configuration change.
 Ultimately I would like to use this to allow secure RPC to be enabled through configuration,
while not blocking normal (current) RPC operation on non-secure Hadoop versions.
> This portion of the patch abstracts out two interfaces from the RPC code:
> RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and server instances
for HMaster and HRegionServer
> RpcServer: this allows differing RPC server implementations, breaking the dependency
on HBaseServer
> The bulk of the current code from HBaseRPC is moved into WritableRpcEngine and is unchanged
other than the interface requirements.  So the current call path remains the same, other than
the HBaseRPC.getProtocolEngine() abstraction.
> Part 2:
> ===============
> The remaining changes provide server-side hooks for registering new RPC protocols/handlers
(per-region to support coprocessors), and client side hooks to support dynamic execution of
the registered protocols.
> The new RPC protocol actions are constrained to org.apache.hadoop.hbase.ipc.CoprocessorProtocol
implementations (which extends VersionedProtocol) to prevent arbitrary execution of methods
against HMasterInterface, HRegionInterface, etc.
> For protocol handler registration, HRegionServer provides a new method:
>   public <T extends CoprocessorProtocol> boolean registerProtocol(
>       byte[] region, Class<T> protocol, T handler)
> which builds a Map of region name to protocol instances for dispatching client calls.
> Client invocations are performed through HTable, which adds the following methods:
>   public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row)
> This directly returns a proxy instance to the CoprocessorProtocol implementation registered
for the region serving row "row".  Any method calls will be proxied to the region's server
and invoked using the map of registered region name -> handler instances.
> Calls directed against multiple rows are a bit more complicated.  They are supported
with the methods:
>   public <T extends CoprocessorProtocol, R> void exec(
>       Class<T> protocol, List<? extends Row> rows,
>       BatchCall<T,R> callable, BatchCallback<R> callback)
>   public <T extends CoprocessorProtocol, R> void exec(
>       Class<T> protocol, RowRange range,
>       BatchCall<T,R> callable, BatchCallback<R> callback)
> where BatchCall and BatchCallback are simple interfaces defining the methods to be called
and a callback instance to be invoked for each result.
> For the sample CoprocessorProtocol interface:
>   interface PingProtocol extends CoprocessorProtocol {
>     public String ping();
>     public String hello(String name);
>   }
> a client invocation might look like:
>     final Map<byte[],R> results = new TreeMap<byte[],R>(...)
>     List<Row> rows = ...
>     table.exec(PingProtocol.class, rows,
>         new HTable.BatchCall<PingProtocol,String>() {
>           public String call(PingProtocol instance) {
>             return instance.ping();
>           }
>         },
>         new BatchCallback<R>(){
>           public void update(byte[] region, byte[] row, R value) {
>             results.put(region, value);
>           }
>         });
> The BatchCall.call() method will be invoked for each row in the passed in list, and the
BatchCallback.update() method will be invoked for each return value.  However, currently the
PingProtocol.ping() invocation will result in a separate RPC call per row, which is less that
> Support is in place to make use of the HRegionServer.multi() invocations for batched
RPC (see the org.apache.hadoop.hbase.client.Exec class), but this does not mesh well with
the current client-side interface.
> In addition to standard code review, I'd appreciate any thoughts on the client interactions
in particular, and whether they would meet some of the anticipated uses of coprocessors.
> This addresses bugs HBASE-2002 and HBASE-2321.
>     http://issues.apache.org/jira/browse/HBASE-2002
>     http://issues.apache.org/jira/browse/HBASE-2321
> Diffs
> -----
>   src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81 
>   src/main/java/org/apache/hadoop/hbase/client/Exec.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/client/ExecResult.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java f1fa836 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263 
>   src/main/java/org/apache/hadoop/hbase/client/MultiAction.java dc0e203 
>   src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 60008b1 
>   src/main/java/org/apache/hadoop/hbase/client/RowRange.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/client/Scan.java 29b3cb0 
>   src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d 
>   src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6 
>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java 134288b 
>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java d3c6c21 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java c3e24d9 
>   src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java adc6292 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1e046f9 
>   src/main/resources/hbase-default.xml 2fcfcc8 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java PRE-CREATION

> Diff: http://review.cloudera.org/r/816/diff
> Testing
> -------
> Thanks,
> Gary

View raw message