hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
Date Thu, 19 Feb 2015 16:54:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327722#comment-14327722

Jonathan Lawlor commented on HBASE-11544:

[~lhofhansl] thanks for the comments

bq. Is the limit per cell or per row?

Sorry, let me be clear in what I mean when I say cell level and row level:

Partitioning at the row level (the current behavior):
Currently, the maxResultSize operates at the row level on the server. What I mean by this
is that the result size limit is checked after each row's worth of cells is fetched. This
presented the problem of running into OOME for large rows because a single row may be many
times larger than the maxResultSize. Thus, when trying to retrieve all the cells for a single
large row we would continue to traverse the row even when we had already passed the result
size limit, and only realize we had exceeded the limit once the entire row's worth of cells
had been retrieved.

Partitioning at the cell level (the new behavior):
The solution that has been implemented above moves the concept of maxResultSize down from
the row level to the cell level. What this means is that the result size limit is checked
after each cell/keyValue is fetched. This is nice because it provides a more precise size
restriction on result size than the current solution. When the result size limit is reached
while fetching the cells/keyValues for a particular row, that row will be returned as partial
results that must be reconstructed client-side (i.e. the server will never contain the entire
row's worth of cells in memory at once).

So when I said the server will only ever see partial results for very large rows, what I mean
is: if the row is very large, the server will be returning partial results for that row in
separate RPC responses, and thus, will never hold the entire row in memory but rather parts
of it at different points in time.

> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even
if it means OOME
> ------------------------------------------------------------------------------------------------------
>                 Key: HBASE-11544
>                 URL: https://issues.apache.org/jira/browse/HBASE-11544
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jonathan Lawlor
>            Priority: Critical
>              Labels: beginner
>         Attachments: HBASE-11544-v1.patch
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has large cells.
 I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the client whatever
we've gathered once we pass out a certain size threshold rather than keep accumulating till
we OOME.

This message was sent by Atlassian JIRA

View raw message