hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13262) ResultScanner doesn't return all rows in Scan
Date Wed, 18 Mar 2015 00:50:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366412#comment-14366412
] 

Jonathan Lawlor commented on HBASE-13262:
-----------------------------------------

bq. Are you implying that this is specifically the problem? I'm not seeing where these sizes
are used for anything more than metrics tracking

So within {{RSRpcServices#scan(...)}} we keep a running tally of the size of the accumulated
{{Result}} within the variable {{currentScanResultSize}}. We collect the {{Result}} in a while
loop that loops while the caching limit hasn't been reached. At the beginning of each iteration
of this loop, we check the running Result size limit against the {{maxResultSize}}. If the
size limit has been reached, we break out of the loop and will end up returning whatever Results
we have accumulated thus far back to the client. The problem is that we then expect the Client
to realize that the Results they receive are larger than the {{maxResultSize}} -- if the client's
size calculation is less than the server's then it's possible the client will misinterpret
the response as meaning the region has been exhausted.

bq. To me, the larger issue seems to be that only a Result[] is returned from ScannerCallable

I agree completely. It is ugly that we return ONLY a {{Result[]}} to the client and then expect
them to understand why those are the Results that were returned from the server. Was the size
limit reached? Was the caching limit reached? Was a partial result formed? Are there more
results on the server or is the region exhausted? There are too many things that the client
needs to infer from the {{Result[]}} alone that the server already had the answer to. I think
it would be great if this could be cleaned up.

> ResultScanner doesn't return all rows in Scan
> ---------------------------------------------
>
>                 Key: HBASE-13262
>                 URL: https://issues.apache.org/jira/browse/HBASE-13262
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0, 1.1.0
>         Environment: Single node, pseduo-distributed 1.1.0-SNAPSHOT
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: testrun_0.98.txt, testrun_branch1.0.txt
>
>
> Tried to write a simple Java client again 1.1.0-SNAPSHOT.
> * Write 1M rows, each row with 1 family, and 10 qualifiers (values [0-9]), for a total
of 10M cells written
> * Read back the data from the table, ensure I saw 10M cells
> Running it against {{04ac1891}} (and earlier) yesterday, I would get ~20% of the actual
rows. Running against 1.0.0, returns all 10M records as expected.
> [Code I was running|https://github.com/joshelser/hbase-hwhat/blob/master/src/main/java/hbase/HBaseTest.java]
for the curious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message