hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10169) Batch coprocessor
Date Mon, 16 Dec 2013 23:56:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849883#comment-13849883
] 

Gary Helmling commented on HBASE-10169:
---------------------------------------

Sure, the way I see #1 is simply batching the RPCs performed for HRegion.execService() invocations
for regions on the same regionserver.  In the same way that a multi-get will do a single RPC
to return results from Get requests across multiple regions on a single regionserver, a batched
coprocessor service request for all the relevant regions on a regionserver could return a
single response containing multiple CoprocessorServiceResponse objects.

The high-level execution would look like:
# on the client (in HTable.coprocessorService()) group regions involved in a coprocessorService()
request by regionserver
# create a request object per-regionserver containing multiple CoprocessorServiceRequest instances
(one per region)
# the regionserver would execute the individual requests against each region, calling HRegion.execService()
# before returning, the regionserver aggregates the individual responses into a single response
object containing multiple CoprocessorServiceResponse instances (again one per region)
# the client, on receiving the response, invokes Batch.Callback.update() with the contents
of each CoprocessorServiceResponse

HRegionServer.multi() provides a good model for this, I think.

This could all happen transparently, using the existing HTable.coprocessorService() client
interface and would be a massive improvement in RPC efficiency.

Regarding #2, providing user-defined aggregations on the server-side (or "combiners" as described
in HBASE-5762) could provide further efficiency improvements in limiting response bandwidth
for some use-cases, but I think it deserves to be looked at on it's own, given that it would
create an entirely new user-facing API.

> Batch coprocessor
> -----------------
>
>                 Key: HBASE-10169
>                 URL: https://issues.apache.org/jira/browse/HBASE-10169
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Coprocessors
>    Affects Versions: 0.99.0
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: Batch Coprocessor Design Document.docx, HBASE-10169.patch
>
>
> This is designed to improve the coprocessor invocation in the client side. 
> Currently the coprocessor invocation is to send a call to each region. If there’s one
region server, and 100 regions are located in this server, each coprocessor invocation will
send 100 calls, each call uses a single thread in the client side. The threads will run out
soon when the coprocessor invocations are heavy. 
> In this design, all the calls to the same region server will be grouped into one in a
single coprocessor invocation. This call will be spread into each region in the server side,
and the results will be merged ahead in the server side before being returned to the client.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message