Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 17 Mar 2015 23:36:39 +0000 (UTC)
From: "Josh Elser (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12782590.1426606028000.124669.1426635399893@Atlassian.JIRA>
In-Reply-To: <JIRA.12782590.1426606028000@Atlassian.JIRA>
References: <JIRA.12782590.1426606028000@Atlassian.JIRA>
 <JIRA.12782590.1426606028679@arcas>
Subject: [jira] [Commented] (HBASE-13262) ResultScanner doesn't return all
 rows in Scan
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366310#comment-14366310 ] 

Josh Elser commented on HBASE-13262:
------------------------------------

Kudos on tracking this down!

bq. The net effect is that the client checks its size limit and sees that the limit has not been reached, so it assumes that the region has been exhausted and moves the scanner to the next region... so as Josh Elser predicted, the root cause is that we jump between regions too soon....

Bingo. I just got to this point as well.

{panel:title=ClientScanner.java:482}
{code}
        } while (remainingResultSize > 0 && countdown > 0
            && (!partialResults.isEmpty() || possiblyNextScanner(countdown, values == null)));
{code}
{panel}

One important thing that I think I've convinced myself of is that this also only happens when there are no queued partial results in the client as well (as the presence of the partial will also force the client to talk to the same region again).

I'll take a look at the stuff you attached (again, much appreciated) and see if I can chow down on the rest of your analysis and merge that in with what I (think) I figured out.

> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java] for the curious.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)