hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16388) Prevent client threads being blocked by only one slow region server
Date Thu, 18 Aug 2016 06:08:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425959#comment-15425959
] 

Phil Yang commented on HBASE-16388:
-----------------------------------

Will we use non-blocking Stub in blocking Table interface? This patch only handles BlockingStub.
And I think AsyncTable won't block users' threads. So we need not have any limit of concurrent
requests for a server in Stub if it is only used in AsyncTable.

> Prevent client threads being blocked by only one slow region server
> -------------------------------------------------------------------
>
>                 Key: HBASE-16388
>                 URL: https://issues.apache.org/jira/browse/HBASE-16388
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>         Attachments: HBASE-16388-v1.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch
>
>
> It is a general use case for HBase's users that they have several threads/handlers in
their service, and each handler has its own Table/HTable instance. Generally users think each
handler is independent and won't interact each other.
> However, in an extreme case, if a region server is very slow, every requests to this
RS will timeout, handlers of users' service may be occupied by the long-waiting requests even
requests belong to other RS will also be timeout.
> For example: 
> If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 10 region
servers whose average response time is 50ms. If no region server is slow, we can handle 2000
requests per second.
> Now this service's QPS is 1000. If there is one region server very slow and all requests
to it will be timeout. Users hope that only 10% requests failed, and 90% requests' response
time is still 50ms, because only 10% requests are located to the slow RS. However, each second
we have 100 long-waiting requests which exactly occupies all 100 handles. So all handlers
is blocked, the availability of this service is almost zero.
> To prevent this case, we can limit the max concurrent requests to one RS in process-level.
Requests exceeding the limit will throws ServerBusyException(extends DoNotRetryIOE) immediately
to users. In the above case, if we set this limit to 20, only 20 handlers will be occupied
and other 80 handlers can still handle requests to other RS. The availability of this service
is 90% as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message