Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 21 Apr 2015 21:44:58 +0000 (UTC)
From: "Jonathan Lawlor (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12819806.1428609323000.67380.1429652698937@Atlassian.JIRA>
In-Reply-To: <JIRA.12819806.1428609323000@Atlassian.JIRA>
References: <JIRA.12819806.1428609323000@Atlassian.JIRA>
 <JIRA.12819806.1428609323089@arcas>
Subject: [jira] [Commented] (HBASE-13442) Rename scanner caching to a more
 semantically correct term such as row limit
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505836#comment-14505836 ] 

Jonathan Lawlor commented on HBASE-13442:
-----------------------------------------

bq. I don't see why an app should care specifically about how many rows the client transfers from the server in each RPC - bytes seem the more relevant currency to tune for performance.

Really good point, I can't think of such a scenario either. Certainly we want to return results from the server on the basis of size rather than some arbitrary number of rows (since row size can vary table to table, there isn't a universally "good" row limit). This is supported by the move to the default configurations of (caching = Integer.MAX_VALUE, maxResultSize = 2 MB). So actually, the best course of action here wouldn't be to rename caching... but actually to deprecate it so eventually it can be removed completely in favor of rowLimit.

The feature in the protocol that allows the client to ask for a certain number of rows would remain, but only be used for backwards compatibility and for the scenario that the client wants to limit itself to only a certain number of rows. Makes sense to me.

With such a change, we would also want to remove any associated configurations for caching/rowlimit  in hbase-site.xml and hbase-default.xml. There isn't a scenario (at least that I can think of) where it would be appropriate to limit all scans to a particular number of rows and then close them. The row limit would be like the startRow or stopRow settings on scans, configured on a per scan basis with no means to set a global default for all scans.

> Rename scanner caching to a more semantically correct term such as row limit
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-13442
>                 URL: https://issues.apache.org/jira/browse/HBASE-13442
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Lawlor
>         Attachments: HBASE-13442-proposal.diff
>
>
> Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer size rather than number of rows. As a result, caching should now only be configured in instances where the user knows that they will only need X rows. Thus, caching should be renamed to something that is more semantically correct such as rowLimit.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)