hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-13262) ResultScanner doesn't return all rows in Scan
Date Fri, 20 Mar 2015 05:20:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370740#comment-14370740
] 

Lars Hofhansl edited comment on HBASE-13262 at 3/20/15 5:20 AM:
----------------------------------------------------------------

bq. So 0.98 has the issue too? (Jonathan Lawlor seems to indicate not? Maybe I misread)

[~stack] So the general issue is that the counting on client and server needs to match 100%
in order for this to work .

There are multiple ways to make this fail:
* accidentally with incorrect code, which lead to this issue
* having client and server configured with different values for the max scanner result size
in 0.98 or 1.0 (that I demonstrate in the test in patch)
* probably more

The sizing on the client is fickle and bad. It _has_ to go.
The patch I propose just does away with the sizing on the client for 0.98 and 1.0. That will
cause an extra RPC if the scanner caching is set such that it would fire after the size limit,
in that case we need the extra RPC to detect that we're done with a region.

In 1.1 and later we can do what has been proposed here in various forms and add some extra
flag to the RPC to indicate whether we filled the batch rather than trying to derive this
information for the size of the results array.

In either case the matching size calculation on the client is bad and should be removed in
all cases.

Am I making sense? Maybe we should have two different jira...?

Edit: Fixed lot's of spelling errors...


was (Author: lhofhansl):
bq. So 0.98 has the issue too? (Jonathan Lawlor seems to indicate not? Maybe I misread)

[~stack] So the general issue is that the counting on client and server need to match 100%
in order for this to work .

There are multiple ways to make this fail:
* accidentally with incorrect, which lead to this issue
* having client and server configured with different value for for the max scanner result
size in 0.98 or 1.0 (that I demonstrate in the test in patch)
* probably more

The sizing on the client is fickle and bad. It _has_ to go.
The patch I propose just does away with the sizing on the client for 0.98 and 1.0. That will
cause an extra RPC if the scanner caching is set such that it would fire after the size limit,
in that case we need the extra RPC to detect that we're doing with a region.

In 1.1 and later we do what has been proposed here in various forms and add some extra flag
to the RPC to indicate whether we filled the batch rather than trying to derive this information
for the size of the results array.

In either case the matching size calculation on the client is bad.

Am I making sense? Maybe we should have two different jira...?


> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0, 0.98.13
>
>         Attachments: 13262-0.98-testpatch.txt, 13262-tag-length-for-withTags-parameter.txt,
regionserver-logging.diff, testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total
of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual
rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message