Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8FF9210E94 for ; Thu, 13 Mar 2014 00:02:52 +0000 (UTC) Received: (qmail 97576 invoked by uid 500); 13 Mar 2014 00:02:41 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 97376 invoked by uid 500); 13 Mar 2014 00:02:27 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96688 invoked by uid 99); 13 Mar 2014 00:01:56 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 00:01:56 +0000 Date: Thu, 13 Mar 2014 00:01:56 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10679) Both clients get wrong scan results if the first scanner expires and the second scanner is created with the same scannerId on the same region MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932674#comment-13932674 ] Andrew Purtell commented on HBASE-10679: ---------------------------------------- +1, thanks! > Both clients get wrong scan results if the first scanner expires and the second scanner is created with the same scannerId on the same region > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-10679 > URL: https://issues.apache.org/jira/browse/HBASE-10679 > Project: HBase > Issue Type: Bug > Components: regionserver > Reporter: Feng Honghua > Assignee: Feng Honghua > Priority: Critical > Fix For: 0.96.2, 0.98.1, 0.99.0 > > Attachments: HBASE-10679-trunk_v1.patch, HBASE-10679-trunk_v2.patch, HBASE-10679-trunk_v2.patch, HBASE-10679-trunk_v2.patch > > > The scenario is as below (both Client A and Client B scan against Region R) > # A opens a scanner SA on R, the scannerId is N, it successfully get its first row "a" > # SA's lease expires and it's removed from scanners > # B opens a scanner SB on R, the scannerId is N too. it successfully get its first row "m" > # A issues its second scan request with scannerId N, regionserver finds N is valid scannerId and the region matches too. (since the region is always online on this regionserver and both two scanners are against it), so it executes scan request on SB, returns "n" to A -- wrong! (get data from other scanner, A expects row something like "b" that follows "a") > # B issues its second scan request with scannerId N, regionserver also thinks it's valid, and executes scan on SB, return "o" to B -- wrong! (should return "n" but "n" has been scanned out by A just now) > The consequence is both clients get wrong scan results: > # A gets data from scanner created by other client, its own scanner has expired and removed > # B misses data which should be gotten but has been wrongly scanned out by A > The root cause is scannerId generated by regionserver can't be guaranteed unique within regionserver's whole lifecycle, *there is only guarantee that scannerIds of scanners that are currently still valid (not expired) are unique*, so a same scannerId can present in scanners again after a former scanner with this scannerId expires and has been removed from scanners. And if the second scanner is against the same region, the bug arises. > Theoretically, the possibility of above scenario should be very rare(two consecutive scans on a same region from two different clients get a same scannerId, and the first expires before the second is created), but it does can happen, and once it happens, the consequence is severe(all clients involved get wrong data), and should be extremely hard to diagnose/debug -- This message was sent by Atlassian JIRA (v6.2#6252)