hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers
Date Tue, 10 Nov 2015 18:23:10 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-14791:
----------------------------------
    Description: 
We found that some of our copy table job run for many hours, even when there isn't that much
data to copy.

[~vik.karma] did his magic and found that the issue is with copying delete markers (we use
raw mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched
and hence sent to the other side one by one, causing a network RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in
TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.

  was:
We found that some of our copy table job run for many hours, even when there isn't that much
data to copy.

[~vik.karma] did his magic and found that the issue with copying delete markers (we use raw
mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched
and hence sent to the other side one by one, cause a network RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in
TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.


> [0.98] CopyTable is extremely slow when moving delete markers
> -------------------------------------------------------------
>
>                 Key: HBASE-14791
>                 URL: https://issues.apache.org/jira/browse/HBASE-14791
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.16
>            Reporter: Lars Hofhansl
>            Assignee: Alex Araujo
>
> We found that some of our copy table job run for many hours, even when there isn't that
much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete markers (we
use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not
batched and hence sent to the other side one by one, causing a network RTT for each delete
marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations
in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message