hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaPing Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16224) Reduce the number of RPCs for the large PUTs
Date Thu, 18 Aug 2016 13:13:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426407#comment-15426407

ChiaPing Tsai commented on HBASE-16224:

hi [~yuzhihong@gmail.com]

Q1) In the new spreadsheet, hbase-master(3, 2000, 2000, 6, seq), what's the meaning of 3 threads
A1)  Three threads submit data concurrently. (same Connection instance and diffenent table

Q2) Do you know why hbase-16224(3, 2000, 2000, 6, seq) exhibited much better speed up compared
to the case of hbase-16224(3, 2000, 1000, 6, seq) ?
A2) BufferedMutatorImpl will grab few mutations if the mutations have too many KVs. If the
running tasks are too many, the few mutations will cause the busy-waiting and small request.
Because the AP always iterates the same row collection.
This patch lets the AP access the inner buffer of BufferedMutatorImpl. It produces two benefits.
1) The AP can take the different rows on the next iteration if current rows are located on
the busy regions or regionservers 2) The AP can generate large requests because it can iterate
all rows instead of partial rows.

In summary, the BufferedMutatorImpl has no idea about grabbing the "good" rows for AP. If
there are too many rows need to process, it is probable that the BufferedMutatorImpl grabs
the "wrong" rows for AP.

> Reduce the number of RPCs for the large PUTs
> --------------------------------------------
>                 Key: HBASE-16224
>                 URL: https://issues.apache.org/jira/browse/HBASE-16224
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: ChiaPing Tsai
>            Assignee: ChiaPing Tsai
>            Priority: Minor
>         Attachments: HBASE-16224-v1.patch, HBASE-16224-v2.patch, HBASE-16224-v3.patch,
HBASE-16224-v4.patch, HBASE-16224-v5.patch, HBASE-16224-v6.patch, HBASE-16224-v7.patch, HBASE-16224-v8.patch,
HBASE-16224-v9.patch, experiment-v9.patch.xlsx, experiment.xlsx
> This patch is proposed to reduce the number of RPC for the large PUTs 
> The number and data size of write thread(SingleServerRequestRunnable) is a result of
three main factors:
> 1) The flush size taken by BufferedMutatorImpl#backgroundFlushCommits
> 2) The limit of task number
> 3) ClientBackoffPolicy
> A lot of requests created with less MUTATIONs is a result of two reason: 
> 1) many regions of target table are in different server.
> 2) flush size in step one is summed by “all” server rather than “individual”
> This patch removes the limit of flush size in step one and add maximum size to submit
for each server in the AsyncProcess

This message was sent by Atlassian JIRA

View raw message