hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
Date Mon, 23 Mar 2015 23:39:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376919#comment-14376919
] 

Jonathan Lawlor commented on HBASE-11544:
-----------------------------------------

[~apurtell] [~lhofhansl] Thanks for bringing up these discussion points, I have included some
discussion below about the design decisions made here and it would be great to hear your thoughts
on them.

bq. If scanning millions of rows, millions of objects?

Ya

bq. The size estimations are done up in RSRpcServices

To avoid out of memory errors that resulted from very large rows, the size calculation was
pushed all the way down into StoreScanner to be performed between cells (rather than between
rows in RSRpcServices). This meant that we may reach the size limit in the middle of a row
and form a partial result.

With the size calculation pushed all the way down to StoreScanner, we needed some way of communicating
upwards to the RegionScanner and RSRpcServices when a partial result is formed (i.e. we reach
the size limit in the middle of a row). At first, the intention was to NOT change the return
type from boolean. However, the implementation with the boolean return type ended up requiring
many repetitions of the size calculation. 

With the boolean return type, the RegionScanner and RSRpcServices both needed to calculate
the result size (in addition to the calculation that had been pushed down to StoreScanner).
RegionScanner and RSRpcServices needed to do this in order to check whether or not the size
limit had been reached since there was no way to communicate this understanding upwards with
a boolean that indicates more values exists. The problems with this approach were:
* The size calculation was being repeated too much
* The state was not explicit enough. Cells were being returned from StoreScanner and it was
up to the caller of StoreScanner#next to figure out why these were the cells being returned
(size limit reached? batch limit reached?). The only way for the state to bubble up from the
StoreScanner was to repeat all of the logic that made the StoreScanner return those Cells.

NextState was introduced to make this communication more explicit and avoid replication of
size calculations. 

Any alternative approaches are welcomed. If there is a way to keep the boolean return type
and avoid replication of the size calculation, we could certainly try that alternative. Or,
if repeating the size calculation is less costly than the NextState, perhaps we should go
down that route.

> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even
if it means OOME
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11544
>                 URL: https://issues.apache.org/jira/browse/HBASE-11544
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jonathan Lawlor
>            Priority: Critical
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch,
HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch,
HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, HBASE-11544-v8-branch-1.patch,
HBASE-11544-v8.patch, gc.j.png, hits.j.png, mean.png, net.j.png
>
>
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has large cells.
 I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the client whatever
we've gathered once we pass out a certain size threshold rather than keep accumulating till
we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message