hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chance Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19389) RS's handlers are all busy when writing many columns (more than 1000 columns)
Date Thu, 30 Nov 2017 16:59:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272941#comment-16272941
] 

Chance Li commented on HBASE-19389:
-----------------------------------

We have done some small tests for concurrent writing to CSLM, see test number below. We can
see the RT growth is very fast (performance reduction) in the case of concurrency.
!CSLM-concurrent-write.png!

About solution, one choice is to protect CSLM avoiding large concurrency writing, another
is to improve the CSLM. By the way, In our scene we don't want to use Qos(request-throttling).
We have chosen a more engineering solution which is to protect CSLM avoiding large concurrency
and many columns writing. In this way, we can avoid the all RS handlers doing a more slow
call. In another word, the other calls have chance to be handled. 

about the patch:
1. Dynamic configuration: such as min column num and concurrent num.
2. Return #RegionTooBusyException When it exceeds the threshold.
3. It's not strong limit, we wan't use lock.  so handler maybe busy in short time.
4. Only for multi op, not Append. 

ycsb result with patch:
!ycsb-result.png!

metrics:
!metrics-1.png!
 
Welcome any suggestion.  And I will upload the patch in 2 days , and upload more test number.

> RS's handlers are all busy when writing many columns (more than 1000 columns) 
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-19389
>                 URL: https://issues.apache.org/jira/browse/HBASE-19389
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase
>    Affects Versions: 2.0.0
>         Environment: 2000+ Region Servers
> PCI-E ssd
>            Reporter: Chance Li
>            Assignee: Chance Li
>            Priority: Minor
>             Fix For: 2.0.0, 3.0.0
>
>         Attachments: CSLM-concurrent-write.png, metrics-1.png, ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's handlers are all
busy sometimes. And after investigation we found the root cause is about CSLM, such as compare
function heavy load. We reviewed the related WALs, and found that there were many columns
(more than 1000 columns) were writing at that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message