hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomu Tsuruhara (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17871) scan#setBatch(int) call leads wrong result of VerifyReplication
Date Thu, 06 Apr 2017 11:20:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958761#comment-15958761
] 

Tomu Tsuruhara commented on HBASE-17871:
----------------------------------------

Oops.. sorry about that, attaching order.

And again, hadoop QA looks like failed ..
https://builds.apache.org/job/PreCommit-HBASE-Build/6348/console

{noformat}
Modes:  MultiJDK  Jenkins  Robot  Docker  ResetRepo  UnitTests 
Processing: HBASE-17871
ERROR: Unsure how to process HBASE-17871.
{noformat}

I'll attach the same patch again as v4.

> scan#setBatch(int) call leads wrong result of VerifyReplication
> ---------------------------------------------------------------
>
>                 Key: HBASE-17871
>                 URL: https://issues.apache.org/jira/browse/HBASE-17871
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Tomu Tsuruhara
>            Assignee: Tomu Tsuruhara
>            Priority: Minor
>         Attachments: after.png, beforethepatch.png, HBASE-17871.master.001.patch, HBASE-17871.master.002.patch,
HBASE-17871.master.003.patch, HBASE-17871.master.003.patch, HBASE-17871.master.004.patch
>
>
> VerifyReplication tool printed weird logs.
> {noformat}
> 2017-04-03 23:30:50,252 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001001930000
> 2017-04-03 23:30:50,280 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001001930000
> 2017-04-03 23:30:50,387 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001003850000
> 2017-04-03 23:30:50,414 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001003850000
> 2017-04-03 23:30:50,480 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
CONTENT_DIFFERENT_ROWS, rowkey=a00001005320000
> 2017-04-03 23:30:50,508 ERROR [main] org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication:
ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001005320000
> {noformat}
> Here, each bad rows were marked as both {{CONTENT_DIFFERENT_ROWS}} and {{ONLY_IN_PEER_TABLE_ROWS}}.
> This should never happen so I took a look at code and found scan.setBatch call.
> {code}
>     @Override
>     public void map(ImmutableBytesWritable row, final Result value,
>                     Context context)
>         throws IOException {
>       if (replicatedScanner == null) {
> 	    ...
>         final Scan scan = new Scan();
>         scan.setBatch(batch);
> {code}
> As stated in HBASE-16376, {{scan#setBatch(int)}} call implicitly allows scan results
to be partial.
> Since {{VerifyReplication}} is assuming each {{scanner.next()}} call returns entire row,
> partial results break compare logic.
> We should avoid setBatch call here.
> Thanks to RPC chunking (explained in this blog https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1),
> it's safe and acceptable I think.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message