hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2939) Allow Client-Side Connection Pooling
Date Tue, 19 Apr 2011 07:13:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021435#comment-13021435
] 

Karthick Sankarachary commented on HBASE-2939:
----------------------------------------------

{quote}It would help build confidence if you had a unit test to prove that the RoundRobinPool
was indeed RoundRobin{quote}

Point well taken. I've added a suite of tests to cover all types of pool maps.

{quote}I see the Pool Interface. Seems to be straight subset of Map Interface? Do we need
PoolMap then? Or should PoolMap be non-public?{quote}
The primary role of the {{PoolMap}} is to associate a pool of values with every key. The first
time a key is inserted, it creates a pool of the specified type, and puts the value into the
that pool. Subsequent inserts into the same key will put the value into the pre-existing pool.
By the same token, when a key is removed, it's corresponding pool is cleared. 

The {{Pool}} interface, while it may seem like a subset of {{Map}}, was meant to be generic
enough that it could represent not just a bounded list (a.k.a round-robin pool), or a bounded
queue (a.k.a. reusable pool), but also a thread-local object (a.k.a. thread-local pool).

{quote}Should the pool implementations be inner classes of PoolMap because PoolMap refers
to them explicitly in enum and in its little factory for creating them.{quote}
Agreed. I made the {{PoolMap}} a self-contained entity.

{quote{Why does javadoc talk about SharedMap? Should that be PoolMap?{quote}
Fixed.

FYI, the version of the updated patch is V6 (sorry about the poor patch naming convention).



> Allow Client-Side Connection Pooling
> ------------------------------------
>
>                 Key: HBASE-2939
>                 URL: https://issues.apache.org/jira/browse/HBASE-2939
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.89.20100621
>            Reporter: Karthick Sankarachary
>            Assignee: ryan rawson
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-2939-0.20.6.patch, HBASE-2939-LATEST.patch, HBASE-2939.patch,
HBASE-2939.patch, HBASE-2939.patch, HBaseClient.java
>
>
> By design, the HBase RPC client multiplexes calls to a given region server (or the master
for that matter) over a single socket, access to which is managed by a connection thread defined
in the HBaseClient class. While this approach may suffice for most cases, it tends to break
down in the context of a real-time, multi-threaded server, where latencies need to be lower
and throughputs higher. 
> In brief, the problem is that we dedicate one thread to handle all client-side reads
and writes for a given server, which in turn forces them to share the same socket. As load
increases, this is bound to serialize calls on the client-side. In particular, when the rate
at which calls are submitted to the connection thread is greater than that at which the server
responds, then some of those calls will inevitably end up sitting idle, just waiting their
turn to go over the wire.
> In general, sharing sockets across multiple client threads is a good idea, but limiting
the number of such sockets to one may be overly restrictive for certain cases. Here, we propose
a way of defining multiple sockets per server endpoint, access to which may be managed through
either a load-balancing or thread-local pool. To that end, we define the notion of a SharedMap,
which maps a key to a resource pool, and supports both of those pool types. Specifically,
we will apply that map in the HBaseClient, to associate multiple connection threads with each
server endpoint (denoted by a connection id). 
>  Currently, the SharedMap supports the following types of pools:
>     * A ThreadLocalPool, which represents a pool that builds on the ThreadLocal class.
It essentially binds the resource to the thread from which it is accessed.
>     * A ReusablePool, which represents a pool that builds on the LinkedList class. It
essentially allows resources to be checked out, at which point it is (temporarily) removed
from the pool. When the resource is no longer required, it should be returned to the pool
in order to be reused.
>     * A RoundRobinPool, which represents a pool that stores its resources in an ArrayList.
It load-balances access to its resources by returning a different resource every time a given
key is looked up.
> To control the type and size of the connection pools, we give the user a couple of parameters
(viz. "hbase.client.ipc.pool.type" and "hbase.client.ipc.pool.size"). In case the size of
the pool is set to a non-zero positive number, that is used to cap the number of resources
that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean
an unbounded pool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message