hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13262) ResultScanner doesn't return all rows in Scan
Date Tue, 24 Mar 2015 20:42:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378559#comment-14378559

Jonathan Lawlor commented on HBASE-13262:

[~elserj] Nice patch, some review below

Nice tests, I especially like the new TestClientScanner

bq. // TODO Use the server's response about more results
Not sure what this line means, do we need to check the more results flag here?

I like the idea of using moreResults flag but I believe we need to actually introduce a new
flag into the ScanResponse. Unfortunately, the name moreResults is a little misleading as
it seems perfect for what we are trying to achieve. Looking into RSRpcServices to see when
this moreResults flag is set to false, it looks like this happens only when scanner.isFilterDone()
is true. Looking closer, RegionScannerImpl#isFilterDone is only true when the RegionScanner
wants to indicate that the entire scan should stop (i.e. the client shouldn't even try to
change regions, the whole scan is done).

So to be clear, it looks as though the moreResults flag is false ONLY when the entire scan
needs to stop, NOT when a region is exhausted. The net effect is that moreResults will always
appear to be true client side, even when the region is exhausted. Thus, I think we will still
end up making that extra RPC that Lars mentioned above in order to see that the Result[] is
empty and thus the region is exhausted, before the region change occurs.

Since moreResults is a flag that is used for global scan logic (logic not specific to a particular
region), I think we need to introduce a new flag that is specific to the region's results.
If the result size limit or caching limit is reached inside RSRpcServices, return true, else

bq. // Server didn't respond whether it has more results or not.
Is it possible here that we may inadvertently interpret the missing flag as meaning the region
is exhausted? Probably fine because the limit logic is still in the ClientScanner while condition,
just wondering.

> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0, 0.98.13
>         Attachments: 13262-0.98-testpatch.txt, HBASE-13262-branch-1-v2.patch, HBASE-13262-branch-1.patch,
HBASE-13262-v1.patch, HBASE-13262-v2.patch, HBASE-13262.patch, regionserver-logging.diff,
testrun_0.98.txt, testrun_branch1.0.txt
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total
of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual
rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
for the curious.

This message was sent by Atlassian JIRA

View raw message