hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17871) scan#setBatch(int) call leads wrong result of VerifyReplication
Date Wed, 05 Apr 2017 10:03:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956616#comment-15956616
] 

Phil Yang commented on HBASE-17871:
-----------------------------------

I think the reason of CONTENT_DIFFERENT_ROWS and ONLY_IN_PEER_TABLE_ROWS both being logged
is because the comparison is also batched. For example, if the row in source cluster have
500 cells and in peer cluster there are 600 cells, and setBatch(500). There will be two comparison,
first is 500cells vs 500cells, and the second is 0 vs 100. So if the first comparison is different,
we will see CONTENT_DIFFERENT_ROWS and ONLY_IN_PEER_TABLE_ROWS.

I think the behavior is as expected? If you don't want to see this you should not setBatch.
But this feature is still useful to prevent map task OOM. RPC chunking is to prevent OOM at
server side.

> scan#setBatch(int) call leads wrong result of VerifyReplication
> ---------------------------------------------------------------
>
>                 Key: HBASE-17871
>                 URL: https://issues.apache.org/jira/browse/HBASE-17871
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Tomu Tsuruhara
>            Assignee: Tomu Tsuruhara
>            Priority: Minor
>         Attachments: HBASE-17871.master.001.patch
>
>
> VerifyReplication tool printed weird logs.
> {noformat}
> 2017-04-03 23:30:50,252 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001001930000
> 2017-04-03 23:30:50,280 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001001930000
> 2017-04-03 23:30:50,387 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001003850000
> 2017-04-03 23:30:50,414 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001003850000
> 2017-04-03 23:30:50,480 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001005320000
> 2017-04-03 23:30:50,508 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001005320000
> {noformat}
> Here, each bad rows were marked as both {{CONTENT_DIFFERENT_ROWS}} and {{ONLY_IN_PEER_TABLE_ROWS}}.
> This should never happen so I took a look at code and found scan.setBatch call.
> {code}
>     @Override
>     public void map(ImmutableBytesWritable row, final Result value,
>                     Context context)
>         throws IOException {
>       if (replicatedScanner == null) {
> 	    ...
>         final Scan scan = new Scan();
>         scan.setBatch(batch);
> {code}
> As stated in HBASE-16376, {{scan#setBatch(int)}} call implicitly allows scan results
to be partial.
> Since {{VerifyReplication}} is assuming each {{scanner.next()}} call returns entire row,
> partial results break compare logic.
> We should avoid setBatch call here.
> Thanks to RPC chunking (explained in this blog https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1),
> it's safe and acceptable I think.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message