hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13442) Rename scanner caching to a more semantically correct term such as row limit
Date Tue, 21 Apr 2015 21:44:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505836#comment-14505836

Jonathan Lawlor commented on HBASE-13442:

bq. I don't see why an app should care specifically about how many rows the client transfers
from the server in each RPC - bytes seem the more relevant currency to tune for performance.

Really good point, I can't think of such a scenario either. Certainly we want to return results
from the server on the basis of size rather than some arbitrary number of rows (since row
size can vary table to table, there isn't a universally "good" row limit). This is supported
by the move to the default configurations of (caching = Integer.MAX_VALUE, maxResultSize =
2 MB). So actually, the best course of action here wouldn't be to rename caching... but actually
to deprecate it so eventually it can be removed completely in favor of rowLimit.

The feature in the protocol that allows the client to ask for a certain number of rows would
remain, but only be used for backwards compatibility and for the scenario that the client
wants to limit itself to only a certain number of rows. Makes sense to me.

With such a change, we would also want to remove any associated configurations for caching/rowlimit
 in hbase-site.xml and hbase-default.xml. There isn't a scenario (at least that I can think
of) where it would be appropriate to limit all scans to a particular number of rows and then
close them. The row limit would be like the startRow or stopRow settings on scans, configured
on a per scan basis with no means to set a global default for all scans.

> Rename scanner caching to a more semantically correct term such as row limit
> ----------------------------------------------------------------------------
>                 Key: HBASE-13442
>                 URL: https://issues.apache.org/jira/browse/HBASE-13442
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Lawlor
>         Attachments: HBASE-13442-proposal.diff
> Caching acts more as a row limit now. By default in branch-1+, a Scan is configured with
(caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we service scans on the basis of buffer
size rather than number of rows. As a result, caching should now only be configured in instances
where the user knows that they will only need X rows. Thus, caching should be renamed to something
that is more semantically correct such as rowLimit.

This message was sent by Atlassian JIRA

View raw message