Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5885D10AB2 for ; Tue, 17 Mar 2015 23:36:40 +0000 (UTC) Received: (qmail 56241 invoked by uid 500); 17 Mar 2015 23:36:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 56096 invoked by uid 500); 17 Mar 2015 23:36:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 55913 invoked by uid 99); 17 Mar 2015 23:36:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 23:36:39 +0000 Date: Tue, 17 Mar 2015 23:36:39 +0000 (UTC) From: "Josh Elser (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13262) ResultScanner doesn't return all rows in Scan MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366310#comment-14366310 ] Josh Elser commented on HBASE-13262: ------------------------------------ Kudos on tracking this down! bq. The net effect is that the client checks its size limit and sees that the limit has not been reached, so it assumes that the region has been exhausted and moves the scanner to the next region... so as Josh Elser predicted, the root cause is that we jump between regions too soon.... Bingo. I just got to this point as well. {panel:title=ClientScanner.java:482} {code} } while (remainingResultSize > 0 && countdown > 0 && (!partialResults.isEmpty() || possiblyNextScanner(countdown, values == null))); {code} {panel} One important thing that I think I've convinced myself of is that this also only happens when there are no queued partial results in the client as well (as the presence of the partial will also force the client to talk to the same region again). I'll take a look at the stuff you attached (again, much appreciated) and see if I can chow down on the rest of your analysis and merge that in with what I (think) I figured out. > ResultScanner doesn't return all rows in Scan > --------------------------------------------- > > Key: HBASE-13262 > URL: https://issues.apache.org/jira/browse/HBASE-13262 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 2.0.0, 1.1.0 > Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Blocker > Fix For: 2.0.0, 1.1.0 > > Attachments: testrun_0.98.txt, testrun_branch1.0.txt > > > Tried to write a simple Java client again 1.1.0-SNAPSHOT. > * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total of 10M cells written > * Read back the data from the table, ensure I saw 10M cells > Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual rows. Running against 1.0.0, returns all 10M records as expected. > [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java] for the curious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)