hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-13541) Deprecate Scan caching in 2.0.0
Date Fri, 01 May 2015 22:55:07 GMT

     [ https://issues.apache.org/jira/browse/HBASE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jonathan Lawlor updated HBASE-13541:
    Attachment: HBASE-13541-WIP.patch

Here's an early WIP patch before it gets much uglier. Since caching has been a core concept
of Scans for so long, it has quite a broad range of usages throughout the codebase. 

The intention, as stated in the description, was to completely strip out all the usages of
caching and deprecate the API. However, it looks like this may not be the way to go. It certainly
seems like in particular instances it can be a useful to have control over how many Results
get transferred per RPC. In particular, such control is useful when:
- The user knows ahead of time they will only require X rows
- The user intends to use caching as a paging mechanism. They want X rows now, they will do
some work, and come back for another X rows.

If both of these workflows could be replicated without caching, it wouldn't be a problem.
However, paging filters cannot accurately reproduce this exact behavior. This is because filters
do no carry state when scanning multiple regions. Also because filters have no way of forcing
a response back to the client other than saying that all other rows will be filtered out (which
is not what we want). 

Thus, it seemed better to repurpose caching as a row limit concept as we initially wanted
to in HBASE-13442 (we have come full circle...). Of course alternative naming is up for debate,
we want it to be as clear and true to what is occurring as possible.

What still needs to be done? 
More grooming through the usages of the caching API as well as references to "caching" in
general (in variable names, method names, javadoc, etc..). Also, auto generated models such
as protobuf models of Scan, and ScanMessage as well as the Thrift model TScan need to be repurposed
to use the new terminology.

> Deprecate Scan caching in 2.0.0
> -------------------------------
>                 Key: HBASE-13541
>                 URL: https://issues.apache.org/jira/browse/HBASE-13541
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Lawlor
>         Attachments: HBASE-13541-WIP.patch
> The public Scan API exposes caching to the application. Caching deals with the number
of rows that are transferred per scan RPC request issued to the server. It does not seem like
a detail that users of a scan should control and introduces some unneeded complication. Seems
more like a detail that should be controlled from the server based on the current scan request
RPC load. This issue proposes that we deprecate the caching API in 2.0.0 so that it can be
removed later. Of course, if there are any concerns please raise them here.

This message was sent by Atlassian JIRA

View raw message