hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qiang Tian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11544) [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME
Date Sat, 26 Jul 2014 12:18:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075363#comment-14075363

Qiang Tian commented on HBASE-11544:

good discussion!

If I understand correctly, there are 3 concerns here:
1)OOME error due to big scan cache size.
most users might not set Scan#setMaxResultSize, or configure {{"hbase.client.scanner.max.result.size"}},
As Anoop mentioned, there is no cap by default(Long.MAX_VALUE). It looks a bug that we can
fix now. adding "total amount of memory threshold to reject new scanners" as Mikhail mentioned,
make it more robust .

2)partial row - cell size is very big
when we could hit it?
from my observation of code(0.98.2), if user set Scan.setBatch/Scan.setMaxResultSize, HRegion#nextRaw/HRegionServer#scan
use it to limit the number of rows, but not create partial row?

3)bytes send per RPC request, network bandwidth 
my understanding is, hbase can definitely control how to send data, but user still has rights
to set how many rows they want to get for a scan?  so they are things at differnt layer, the
former at RPC layer, the latter at scan semantics layer? 
if server gather data until at least one chunk is filled and then sends the chunk to the client..what
if response data size is smaller than a chunk? in current code the responder send data per
call response per socket...but we do can control the write size when it is bigger than the
chunk size, since BufferChain#write write to channel as many as possible..

just my $0.002:-)

> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even
if it means OOME
> ------------------------------------------------------------------------------------------------------
>                 Key: HBASE-11544
>                 URL: https://issues.apache.org/jira/browse/HBASE-11544
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>              Labels: noob
> Running some tests, I set hbase.client.scanner.caching=1000.  Dataset has large cells.
 I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the client whatever
we've gathered once we pass out a certain size threshold rather than keep accumulating till
we OOME.

This message was sent by Atlassian JIRA

View raw message