Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B3F10200BAF for ; Mon, 31 Oct 2016 12:56:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B2847160B05; Mon, 31 Oct 2016 11:56:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8EA44160AF0 for ; Mon, 31 Oct 2016 12:55:59 +0100 (CET) Received: (qmail 19169 invoked by uid 500); 31 Oct 2016 11:55:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 19153 invoked by uid 99); 31 Oct 2016 11:55:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2016 11:55:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6959C2C014E for ; Mon, 31 Oct 2016 11:55:58 +0000 (UTC) Date: Mon, 31 Oct 2016 11:55:58 +0000 (UTC) From: "Phil Yang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16973) Revisiting default value for hbase.client.scanner.caching MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 31 Oct 2016 11:56:00 -0000 [ https://issues.apache.org/jira/browse/HBASE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621957#comment-15621957 ] Phil Yang commented on HBASE-16973: ----------------------------------- Yes, in 1.2.x this feature is useless... But if this feature works, for example, since 1.3.0, I think for users time limit and size limit are more direct than caching and these two limit are enough. I don't think users need to know how many rows the client will "cache" for one call. Setting cache is an old style to limit size and time, what users really need is limit time and size, right? If we can guarantee we will response in time and will not response too much data, we should read as much as possible to speed up the total scanning operations. > Revisiting default value for hbase.client.scanner.caching > --------------------------------------------------------- > > Key: HBASE-16973 > URL: https://issues.apache.org/jira/browse/HBASE-16973 > Project: HBase > Issue Type: Bug > Reporter: Yu Li > Assignee: Yu Li > Attachments: Scan.next_p999.png > > > We are observing below logs for a long-running scan: > {noformat} > 2016-10-30 08:51:41,692 WARN [B.defaultRpcServer.handler=50,queue=12,port=16020] ipc.RpcServer: > (responseTooSlow-LongProcessTime): {"processingtimems":24329, > "call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)", > "client":"11.251.157.108:50415","scandetails":"table: ae_product_image region: ae_product_image,494: > ,1476872321454.33171a04a683c4404717c43ea4eb8978.","param":"scanner_id: 5333521 number_of_rows: 2147483647 > close_scanner: false next_call_seq: 8 client_handles_partials: true client_handles_heartbeats: true", > "starttimems":1477788677363,"queuetimems":0,"class":"HRegionServer","responsesize":818,"method":"Scan"} > {noformat} > From which we found the "number_of_rows" is as big as {{Integer.MAX_VALUE}} > And we also observed a long filter list on the customized scan. After checking application code we confirmed that there's no {{Scan.setCaching}} or {{hbase.client.scanner.caching}} setting on client side, so it turns out using the default value the caching for Scan will be Integer.MAX_VALUE, which is really a big surprise. > After checking code and commit history, I found it's HBASE-11544 which changes {{HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING}} from 100 to Integer.MAX_VALUE, and from the release note there I could see below notation: > {noformat} > Scan caching default has been changed to Integer.Max_Value > This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB) > Results returned from server on basis of size rather than number of rows > Provides better use of network since row size varies amongst tables > {noformat} > And I'm afraid this lacks of consideration of the case of scan with filters, which may involve many rows but only return with a small result. > What's more, we still have below comment/code in {{Scan.java}} > {code} > /* > * -1 means no caching > */ > private int caching = -1; > {code} > But actually the implementation does not follow (instead of no caching, we are caching {{Integer.MAX_VALUE}}...). > So here I'd like to bring up two points: > 1. Change back the default value of HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING to some small value like 128 > 2. Reenforce the semantic of "no caching" -- This message was sent by Atlassian JIRA (v6.3.4#6332)