hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2002) Coprocessors: Client side support
Date Tue, 05 Oct 2010 22:52:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918296#action_12918296
] 

HBase Review Board commented on HBASE-2002:
-------------------------------------------

Message from: "Gary Helmling" <ghelmling@gmail.com>


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Action.java, line 35
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13679#file13679line35>
bq.  >
bq.  >     Action is a new class so we are not breaking any pre-exisiting API here (Even
so, erasure would reduce this API change to the old I believe anyways?).
bq.  >

Action and MultiAction seem to just be internal implementation classes, so I thought this
would be a safe refactoring.

You mean for something like a rolling restart?  I believe the type erasure + ignoring return
types on method lookup (getResult return type was parameterized so would be Object) should
make this continue to work, but I may have introduced breakage elsewhere...


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 37
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line37>
bq.  >
bq.  >     I think the fact that this class if of coprocessors needs to be highlighted
better.  Batch is a super generic name yet this Batch is only for CPs.  A subpackage for these
CP classes would be a pain would it?  Any other way of grouping these CP classes?  A prefix?
 Just throwing it out there (I'm sure you thought about it and had a reason for not going
these routes).

The name is I guess a remnant of these changes starting out as overly-generic, then scaled
back to coprocessors specific implementation.  I don't see any general applicability beyond
that at the moment, so clearing up the naming would probably help.  Don't want to get too
wordy though...  CoprocessorBatch?


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 50
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line50>
bq.  >
bq.  >     cool
bq.  >     
bq.  >     But, this method's name is 'returning'?
bq.  >     
bq.  >     So, it executes the 'method' of 'protocol' and returns the hosting  'Call' whose
invocation has already run?
bq.  >     
bq.  >     Should it be 'execute' or 'executeCall' or 'invokeCall', etc.

This just returns a Batch.Call instance, whose call() method will invoke the specified CoprocessorProtocol
method.  So it returns a Batch.Call that returns the method result.  At this point the remote
invocation has not yet happened.  That won't occur until down in HConnectionManager.HConnectionImplementation.processExecs().
 Reached through passing the Batch.Call instance to HTable.exec(...).

Yeah, "returning()" is a little generic too.  I could rename to forMethod() or callingMethod()?


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 80
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line80>
bq.  >
bq.  >     Do you need the 'Batch.' prefix here?

No, it's extraneous and inconsistent with most HBase code.  I'll drop it.  I guess I was having
a "static method invocations should be referenced by class name" moment...


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 132
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line132>
bq.  >
bq.  >     Should this method be 'public' since its only called in here -- whats returned
out of a 'returning' is an exhausted call.. the receiving caller will not be doing a call
invocation?

Could make this default access I suppose.  It's invoked down in HConnectionManager.HConnectionImplementation.processExecs(),
which then will trigger the underlying CoprocessorProtocol method invocation and RPC call
through ExecRPCInvoker.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 140
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line140>
bq.  >
bq.  >     I don't see Callback passed to call in the above.  I suppose how Callback works
will become clear later.

Yes, Callback is passed to the second version of HTable.exec() for specific handling of Batch.Call.call()
return values.  The first version of HTable.exec() just uses a basic Callback that builds
a Map with results.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 65
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line65>
bq.  >
bq.  >     This is fair I suppose if only one coprocessor per region (Is that true)?

Correct, only a single handler can be registered per CoprocessorProtocol subclass per region.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/ExecResult.java, line 33
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13682#file13682line33>
bq.  >
bq.  >     This class is for CPs only?

Yes, same as Exec above, could be renamed to CoprocessorExecResult for clarity.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 241
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line241>
bq.  >
bq.  >     Want to call this out as a CP method?
bq.  >

This is actually a refactoring of the previous processBatch() method to accommodate Exec instances
as well.

The processExecs() method below doesn't make use of it yet, but I'd like to incorporate that
as an immediate next step for better RPC batching.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 252
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line252>
bq.  >
bq.  >     Cool
bq.  >     
bq.  >     So this would be for a single cooprocessor implementation. 
bq.  >     
bq.  >     You say above that we do an rpc per row but are the rpcs run serially or in
parallell?  If parallel, thats sweet.

They're parallelized using the existing HTable ExecutorService.

But as an immediate next step, I would like to get them also batching into a single RPC call
per CoprocessorProtocol method invocation, per region server.  The scaffolding is already
there in the (Multi)Action parameterization and HConnection.processBatchCallback(), I just
need to coordinate the per RS batching through ExecRPCInvoker.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1036
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1036>
bq.  >
bq.  >     Whats going on here?  You are rigging the createCallable so it can be used by
CPs?
bq.  >     
bq.  >     (no objection, just asking)

Correct, this and the other (Multi)Action parameterization makes it possible to handle MultiAction<Exec>
instances for full batching of the CP method calls per region server.  It's just the ExecRPCInvoker
coordination that isn't quite there yet.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 255
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line255>
bq.  >
bq.  >     So, here is a case where row designates a region, right?  Not a 'row'.  If all
these CP classes were in a sub-package, you could do a package-info on CP w/ examples, etc.
-- copy/paste of the stuff you have above?

Right, in this case the row key is just used to lookup the region location, so we can ultimately
do getRegionServerWithRetries().


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1087
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1087>
bq.  >
bq.  >     Should this be public?  Its not in HConnection, is it?  Or, rather, why is it
not in HConnection?

Should be in HConnection as well.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1182
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1182>
bq.  >
bq.  >     Again, this is rigging MultiAction to support the CP parameterized types?

Correct, scaffolding for a better ExecRPCInvoker to be added.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java, line 353
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13686#file13686line353>
bq.  >
bq.  >     oh, you don't have to repeat this doc up in HTable.  I'd remove it and replace
it '@Override'.

Ah, ok, will remove the dupes.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1340
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13685#file13685line1340>
bq.  >
bq.  >     What if I pass more than one key for a region?  Will CP run twice?

No, if both keys fall into the same region, then it should just use the start key.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/RowRange.java, line 4
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13689#file13689line4>
bq.  >
bq.  >     You have to say something about 'inclusive'?

I'll remove RowRange, since it was only used in a previous version of the HTable.exec() signatures.
 It's nicely parallel to Row, but with only a single implementation (Scan), seems useless.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 112
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line112>
bq.  >
bq.  >     good
bq.  >     
bq.  >     (but of interest, how does this differ from setCause?  Or, could you pass the
ite to the IOE constructor?

Sure, will just pass the ITE as the cause.


bq.  On 2010-10-04 23:09:49, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 35
bq.  > <http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line35>
bq.  >
bq.  >     ... against a Coprocessor?
bq.  >     
bq.  >     Maybe add above?
bq.  >

Yes, could rename CoprocessorExec for clarity.  It's fairly generic but there's no other usage.

I guess in the naming here and elsewhere, I was envisioning Coprocessors as the sort of stored
procedures of HBase.  A basic functionality -- execute this user code -- with coprocessors
as the implementation.  So I took a general approach to naming the client interface.  It seemed
to fit in with the basic operations: Get, Scan, Put, Delete, Exec.

But if this is overly general and confusing, I have no problem renaming this and the other
client bits with a "Coprocessor" prefix.

Will definitely improve javadoc here as well.


- Gary


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/816/#review1414
-----------------------------------------------------------





> Coprocessors: Client side support
> ---------------------------------
>
>                 Key: HBASE-2002
>                 URL: https://issues.apache.org/jira/browse/HBASE-2002
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>
> "High-level call interface for clients. Unlike RPC, calls addressed to rows or ranges
of rows. Coprocessor client library resolves to actual locations. Calls across multiple rows
automatically split into multiple parallelized RPCs"
> Generic multicall RPC facility which incorporates this and multiget/multiput/multidelete
and parallel scanners.
> Group and batch RPCs by region server. Track and retry outstanding RPCs. Ride over region
relocations. 
> Support addressing by explicit region identifier or by row key or row key range. 
> Include a facility for merging results client side. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message