hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chenrongwei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15097) When the scan operation covered two regions,sometimes the final results have duplicated rows.
Date Sat, 23 Jan 2016 14:02:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113755#comment-15113755
] 

chenrongwei commented on HBASE-15097:
-------------------------------------

I think it won't happen.There are two situations under no stop row had been set.
1: table only have one region,(null,null)
2: table has more than one region, such as (null,region_1_endKey)...[region_n-1_startKey,
region_n_startKey), [region_n_startKey,null).
if table only have one region,there is no this problem obviously,because of all data in the
same region,so we just to see the second situation.
Under the second situation,if we not per the patch,according to the region maybe hold the
old data which maybe belong to this region before its splitting, so that the scan operation
will maybe get duplicate rows.But I think this mistake,which the region scan get old data,
would just happen in the region except the last one. Because there is no rowkey can out of
its end key(null),so the last region always has the newest data,according to this reason,we
just need to make sure other regions don't happen this mistake,then we will make the scan
avoid getting old data,and we per this patch just do that thing. 

> When the scan operation covered two regions,sometimes the final results have duplicated
rows.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15097
>                 URL: https://issues.apache.org/jira/browse/HBASE-15097
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.1.2
>         Environment: centos 6.5
> hbase 1.1.2 
>            Reporter: chenrongwei
>            Assignee: chenrongwei
>         Attachments: HBASE-15097-v001.patch, HBASE-15097-v002.patch, HBASE-15097-v003.patch,
HBASE-15097-v004.patch, output.log, rowkey.txt, snapshot2016-01-13 pm 8.42.37.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the scan operationā€˜s start key and end key covered two regions,the first region
returned the rows which were beyond of its' end key.So,this finally leads to duplicated rows
in the results.
> To avoid this problem,we should add a judgment before setting the variable "stopRow"
in the class of HRegion,like follow:
>             if (Bytes.equals(scan.getStopRow(), HConstants.EMPTY_END_ROW) &&
!scan.isGetScan()) {
>                 this.stopRow = null;
>             } else {
>                 if (Bytes.compareTo(scan.getStopRow(), this.getRegionInfo().getEndKey())
>= 0) {
>                     this.stopRow = this.getRegionInfo().getEndKey();
>                 } else {
>                     this.stopRow = scan.getStopRow();
>                 }
>             }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message