hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10305) Batch update performance drops as the number of regions grows
Date Fri, 10 Jan 2014 05:27:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867513#comment-13867513
] 

Chao Shi commented on HBASE-10305:
----------------------------------

bq. you can create a new test case,which is the batch update without WAL,firstly. Make sure
that the problem is sync hLog.

I've tried SKIP_WAL specified at client side, and the performance improves greatly. Moreover,
I think I can confirm the problem by looking into the stack, where threads are stuck at waiting
for log sync.

bq. It depends on what level of write guarentee you want. There is a delayed log syncing feature
already present. In that case there wont be immediate sync after a log append. Instead you
can configure a time period at which this syncer thread doing a sync for till appends.

Hi Anoop, I understand that delayed sync should alleviate this problem. But I think it may
be misleading with the default behaviour (i.e. sync immediately). A good row-key design is
to spread the workload evenly over the cluster. However, this will lead to unexpected performance
degradation as data grows. (The overhead of log syncs increases linearly if a batch size is
larger than the number of servers.)

It seems like such behaviour is intended. So I'm going to close this ticket if no one else
have better suggestions.

> Batch update performance drops as the number of regions grows
> -------------------------------------------------------------
>
>                 Key: HBASE-10305
>                 URL: https://issues.apache.org/jira/browse/HBASE-10305
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Chao Shi
>
> In our use case, we use a small number (~5) of proxy programs that read from a queue
and batch update to HBase. Our program is multi-threaded and HBase client will batch mutations
to each RS.
> We found we're getting lower TPS when there are more regions. I think the reason is RS
syncs HLog for each region. Suppose there is a single region, the batch update will only touch
one region and therefore syncs HLog once. And suppose there are 10 regions per server, in
RS#multi() it have to process update for each individual region and sync HLog 10 times.
> Please note that in our scenario, batched mutations usually are independent with each
other and need to touch a various number of regions.
> We are using the 0.94 series, but I think the trunk should have the same problem after
a quick look into the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message