hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
Date Wed, 04 Mar 2015 20:09:40 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Lawlor updated HBASE-11544:
------------------------------------
    Release Note: 
Results returned from RPC calls may now be returned as partials

When is a Result marked as a partial? 
When the server must stop the scan because the max size limit has been reached. Means that
the LAST Result returned within the ScanResult's Result array may be marked as a partial if
the scan's max size limit caused it to stop in the middle of a row.

Incompatible Change: The return type of InternalScanners#next and RegionScanners#nextRaw has
been changed to NextState from boolean
The previous boolean return value can be accessed via NextState#hasMoreValues()
Provides more context as to what happened inside the scanner

Caching default has been changed to Integer.Max_Value 
This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB)

Results returned from server on basis of size rather than number of rows
Provides better use of network since row size varies amongst tables

Protobuf models have changed for Result, ScanRequest, and ScanResponse to support new partial
Results

Partial Results should be invisible to application layer unless Scan#setAllowPartials is set

Scan#setAllowPartials has been added to allow the application to request to see the partial
Results returned by the server rather than have the ClientScanner form the complete Result
prior to returning it to the application

To disable the use of partial Results on the server, set ScanRequest.Builder#setClientHandlesPartials()
to be false in the ScanRequest issued to server

Partial Results should allow the server to return large rows in parts rather than accumulate
all the cells for that particular row and run out of memory

  was:
Results returned from RPC calls may now be returned as partials

When is a Result marked as a partial? 
When the server must stop the scan because the max size limit has been reached. Means that
the LAST Result returned within the ScanResult's Result array may be marked as a partial if
the scan's max size limit caused it to stop in the middle of a row.

Incompatible Change: The return type of InternalScanners and RegionScanners has been changed
to NextState from boolean
The previous boolean return value can be accessed via NextState#hasMoreValues()
Provides more context as to what happened inside the scanner

Caching default has been changed to Integer.Max_Value 
This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB)

Results returned from server on basis of size rather than number of rows
Provides better use of network since row size varies amongst tables

Protobuf models have changed for Result, ScanRequest, and ScanResponse to support new partial
Results

Partial Results should be invisible to application layer unless Scan#setAllowPartials is set

Scan#setAllowPartials has been added to allow the application to request to see the partial
Results returned by the server rather than have the ClientScanner form the complete Result
prior to returning it to the application

To disable the use of partial Results on the server, set ScanRequest.Builder#setClientHandlesPartials()
to be false in the ScanRequest issued to server

Partial Results should allow the server to return large rows in parts rather than accumulate
all the cells for that particular row and run out of memory


> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even
if it means OOME
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11544
>                 URL: https://issues.apache.org/jira/browse/HBASE-11544
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jonathan Lawlor
>            Priority: Critical
>              Labels: beginner
>         Attachments: HBASE-11544-branch_1_0-v1.patch, HBASE-11544-branch_1_0-v2.patch,
HBASE-11544-v1.patch, HBASE-11544-v2.patch, HBASE-11544-v3.patch, HBASE-11544-v4.patch, HBASE-11544-v5.patch,
HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v6.patch, HBASE-11544-v7.patch, gc.j.png,
hits.j.png, mean.png, net.j.png
>
>
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has large cells.
 I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the client whatever
we've gathered once we pass out a certain size threshold rather than keep accumulating till
we OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message