hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10679) Both clients get wrong scan results if the first scanner expires and the second scanner is created with the same scannerId on the same region
Date Sat, 08 Mar 2014 01:35:43 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924638#comment-13924638

Hadoop QA commented on HBASE-10679:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .
  ATTACHMENT ID: 12633483

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 1.0 profile.

    {color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 1.1 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 	at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:354)

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8929//console

This message is automatically generated.

> Both clients get wrong scan results if the first scanner expires and the second scanner
is created with the same scannerId on the same region
> ---------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-10679
>                 URL: https://issues.apache.org/jira/browse/HBASE-10679
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>            Priority: Critical
>         Attachments: HBASE-10679-trunk_v1.patch, HBASE-10679-trunk_v2.patch, HBASE-10679-trunk_v2.patch
> The scenario is as below (both Client A and Client B scan against Region R)
> # A opens a scanner SA on R, the scannerId is N, it successfully get its first row "a"
> # SA's lease expires and it's removed from scanners
> # B opens a scanner SB on R, the scannerId is N too. it successfully get its first row
> # A issues its second scan request with scannerId N, regionserver finds N is valid scannerId
and the region matches too. (since the region is always online on this regionserver and both
two scanners are against it), so it executes scan request on SB, returns "n" to A -- wrong!
(get data from other scanner, A expects row something like "b" that follows "a")
> # B issues its second scan request with scannerId N, regionserver also thinks it's valid,
and executes scan on SB, return "o" to B -- wrong! (should return "n" but "n" has been scanned
out by A just now)
> The consequence is both clients get wrong scan results:
> # A gets data from scanner created by other client, its own scanner has expired and removed
> # B misses data which should be gotten but has been wrongly scanned out by A
> The root cause is scannerId generated by regionserver can't be guaranteed unique within
regionserver's whole lifecycle, *there is only guarantee that scannerIds of scanners that
are currently still valid (not expired) are unique*, so a same scannerId can present in scanners
again after a former scanner with this scannerId expires and has been removed from scanners.
And if the second scanner is against the same region, the bug arises.
> Theoretically, the possibility of above scenario should be very rare(two consecutive
scans on a same region from two different clients get a same scannerId, and the first expires
before the second is created), but it does can happen, and once it happens, the consequence
is severe(all clients involved get wrong data), and should be extremely hard to diagnose/debug

This message was sent by Atlassian JIRA

View raw message