hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16973) Revisiting default value for hbase.client.scanner.caching
Date Mon, 31 Oct 2016 16:52:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622704#comment-15622704
] 

stack commented on HBASE-16973:
-------------------------------

Dang. Good finding [~carp84]

Lets make the [~yangzhe1991] story the way it is going forward. File an issue to update refguide,
javadoc., and unit tests all to enforce "...Setting cache is an old style to limit size and
time...".

For released software, lets add to the refguide warning as per @yu li recommendation on migration
from 0.98 to 1.1.x (New issue or part of this issue?). I think changing default at this stage
in 1.1.x and 1.2.x lifecycle, it would surprise more than it would help changing the default
but we could add a notice on downloads page and to release notes on this finding of [~carp84]'s?

> Revisiting default value for hbase.client.scanner.caching
> ---------------------------------------------------------
>
>                 Key: HBASE-16973
>                 URL: https://issues.apache.org/jira/browse/HBASE-16973
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: Scan.next_p999.png
>
>
> We are observing below logs for a long-running scan:
> {noformat}
> 2016-10-30 08:51:41,692 WARN  [B.defaultRpcServer.handler=50,queue=12,port=16020] ipc.RpcServer:
> (responseTooSlow-LongProcessTime): {"processingtimems":24329,
> "call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)",
> "client":"11.251.157.108:50415","scandetails":"table: ae_product_image region: ae_product_image,494:
> ,1476872321454.33171a04a683c4404717c43ea4eb8978.","param":"scanner_id: 5333521 number_of_rows:
2147483647
> close_scanner: false next_call_seq: 8 client_handles_partials: true client_handles_heartbeats:
true",
> "starttimems":1477788677363,"queuetimems":0,"class":"HRegionServer","responsesize":818,"method":"Scan"}
> {noformat}
> From which we found the "number_of_rows" is as big as {{Integer.MAX_VALUE}}
> And we also observed a long filter list on the customized scan. After checking application
code we confirmed that there's no {{Scan.setCaching}} or {{hbase.client.scanner.caching}}
setting on client side, so it turns out using the default value the caching for Scan will
be Integer.MAX_VALUE, which is really a big surprise.
> After checking code and commit history, I found it's HBASE-11544 which changes {{HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING}}
from 100 to Integer.MAX_VALUE, and from the release note there I could see below notation:
> {noformat}
> Scan caching default has been changed to Integer.Max_Value 
> This value works together with the new maxResultSize value from HBASE-12976 (defaults
to 2MB) 
> Results returned from server on basis of size rather than number of rows 
> Provides better use of network since row size varies amongst tables
> {noformat}
> And I'm afraid this lacks of consideration of the case of scan with filters, which may
involve many rows but only return with a small result.
> What's more, we still have below comment/code in {{Scan.java}}
> {code}
>   /*
>    * -1 means no caching
>    */
>   private int caching = -1;
> {code}
> But actually the implementation does not follow (instead of no caching, we are caching
{{Integer.MAX_VALUE}}...).
> So here I'd like to bring up two points:
> 1. Change back the default value of HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING to
some small value like 128
> 2. Reenforce the semantic of "no caching"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message