hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10305) Batch update performance drops as the number of regions grows
Date Thu, 09 Jan 2014 06:53:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866388#comment-13866388
] 

Anoop Sam John commented on HBASE-10305:
----------------------------------------

bq.Suppose there is a single region, the batch update will only touch one region and therefore
syncs HLog once
Not one time always. In HRegion, the batch write will be split into mini batches. It will
try to get as many row locks as possible and all such Mutations becomes one mini batch. For
one mini batch, yes one time write and sync to HLog. When there are no concurrent write attempts
for same rows, true , all Mutations in a batch might be getting locks in one shot and just
one mini batch.

So what do you suggest here?

> Batch update performance drops as the number of regions grows
> -------------------------------------------------------------
>
>                 Key: HBASE-10305
>                 URL: https://issues.apache.org/jira/browse/HBASE-10305
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Chao Shi
>
> In our use case, we use a small number (~5) of proxy programs that read from a queue
and batch update to HBase. Our program is multi-threaded and HBase client will batch mutations
to each RS.
> We found we're getting lower TPS when there are more regions. I think the reason is RS
syncs HLog for each region. Suppose there is a single region, the batch update will only touch
one region and therefore syncs HLog once. And suppose there are 10 regions per server, in
RS#multi() it have to process update for each individual region and sync HLog 10 times.
> Please note that in our scenario, batched mutations usually are independent with each
other and need to touch a various number of regions.
> We are using the 0.94 series, but I think the trunk should have the same problem after
a quick look into the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message