Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 1 May 2015 22:55:07 +0000 (UTC)
From: "Jonathan Lawlor (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12823468.1429807646000.53547.1430520907743@Atlassian.JIRA>
In-Reply-To: <JIRA.12823468.1429807646000@Atlassian.JIRA>
References: <JIRA.12823468.1429807646000@Atlassian.JIRA>
 <JIRA.12823468.1429807646779@arcas>
Subject: [jira] [Updated] (HBASE-13541) Deprecate Scan caching in 2.0.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HBASE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Lawlor updated HBASE-13541:
------------------------------------
    Attachment: HBASE-13541-WIP.patch

Here's an early WIP patch before it gets much uglier. Since caching has been a core concept of Scans for so long, it has quite a broad range of usages throughout the codebase. 

The intention, as stated in the description, was to completely strip out all the usages of caching and deprecate the API. However, it looks like this may not be the way to go. It certainly seems like in particular instances it can be a useful to have control over how many Results get transferred per RPC. In particular, such control is useful when:
- The user knows ahead of time they will only require X rows
- The user intends to use caching as a paging mechanism. They want X rows now, they will do some work, and come back for another X rows.

If both of these workflows could be replicated without caching, it wouldn't be a problem. However, paging filters cannot accurately reproduce this exact behavior. This is because filters do no carry state when scanning multiple regions. Also because filters have no way of forcing a response back to the client other than saying that all other rows will be filtered out (which is not what we want). 

Thus, it seemed better to repurpose caching as a row limit concept as we initially wanted to in HBASE-13442 (we have come full circle...). Of course alternative naming is up for debate, we want it to be as clear and true to what is occurring as possible.

What still needs to be done? 
More grooming through the usages of the caching API as well as references to "caching" in general (in variable names, method names, javadoc, etc..). Also, auto generated models such as protobuf models of Scan, and ScanMessage as well as the Thrift model TScan need to be repurposed to use the new terminology.

> Deprecate Scan caching in 2.0.0
> -------------------------------
>
>                 Key: HBASE-13541
>                 URL: https://issues.apache.org/jira/browse/HBASE-13541
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Lawlor
>         Attachments: HBASE-13541-WIP.patch
>
>
> The public Scan API exposes caching to the application. Caching deals with the number of rows that are transferred per scan RPC request issued to the server. It does not seem like a detail that users of a scan should control and introduces some unneeded complication. Seems more like a detail that should be controlled from the server based on the current scan request RPC load. This issue proposes that we deprecate the caching API in 2.0.0 so that it can be removed later. Of course, if there are any concerns please raise them here.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)