Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2380C1747B for ; Thu, 2 Apr 2015 16:34:00 +0000 (UTC) Received: (qmail 96017 invoked by uid 500); 2 Apr 2015 16:33:55 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 95973 invoked by uid 500); 2 Apr 2015 16:33:54 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95962 invoked by uid 99); 2 Apr 2015 16:33:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Apr 2015 16:33:54 +0000 Date: Thu, 2 Apr 2015 16:33:54 +0000 (UTC) From: "Jonathan Lawlor (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-13374) Small scanners (with particular configurations) do not return all rows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Lawlor updated HBASE-13374: ------------------------------------ Attachment: HBASE-13374-v1.patch > Small scanners (with particular configurations) do not return all rows > ---------------------------------------------------------------------- > > Key: HBASE-13374 > URL: https://issues.apache.org/jira/browse/HBASE-13374 > Project: HBase > Issue Type: Bug > Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.13 > Reporter: Jonathan Lawlor > Assignee: Jonathan Lawlor > Priority: Blocker > Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 > > Attachments: HBASE-13374-v1.patch, HBASE-13374-v1.patch, HBASE-13374-v1.patch, small-scanner-data-loss-tests-0.98.patch, small-scanner-data-loss-tests-branch-1.0+.patch > > > I recently ran into a couple data loss issues with small scans. Similar to HBASE-13262, these issues only appear when scans are configured in such a way that the max result size limit is reached before the caching limit is reached. As far as I can tell, this issue affects branches 0.98+ > I should note that after investigation it looks like the root cause of these issues is not the same as HBASE-13262. Rather, these issue are caused by errors in the small scanner logic (I will explain in more depth below). > Furthermore, I do know that the solution from HBASE-13262 has not made its way into small scanners (it is being addressed in HBASE-13335). As a result I made sure to test these issues with the patch from HBASE-13335 applied and I saw that they were still present. > The following two issues have been observed (both lead to data loss): > 1. When a small scan is configured with a caching value of Integer.MAX_VALUE, and a maxResultSize limit that is reached before the region is exhausted, integer overflow will occur. This eventually leads to a preemptive skip of the regions. > 2. When a small scan is configured with a maxResultSize that is smaller than the size of a single row, the small scanner will jump between regions preemptively. This issue seems to be because small scanners assume that, unless a region is exhausted, at least 2 rows will be returned from the server. This assumption isn't clearly state in the small scanners but is implied through the use of {{skipRowOfFirstResult}}. > Again, I would like to stress that the root cause of these issues is *NOT* related to the cause of HBASE-13262. These issues occur because of inappropriate assumption made in the small scanner logic. The inappropriate assumptions are: > 1. Integer overflow will not occur when incrementing caching > 2. At least 2 rows will be returned from the server unless the region has been exhausted > I am attaching a patch that contains tests to display these issues. If these issues should be split into separate JIRAs please let me know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)