hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13215) A limit on the raw key values is needed for each next call of a scanner
Date Mon, 16 Mar 2015 21:21:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363974#comment-14363974

Jonathan Lawlor commented on HBASE-13215:

[~heliangliang] I see, that makes sense to me. Certainly the approach outlined in HBASE-13090
wouldn't be able to provide as fine grained control as a raw key value limit. 

I think that we would probably want to make some comment in the docs of this feature about
how this limit should only be specified in specific circumstances (such as the use cases you
have described above). This seems like a feature that would be nice to have to provide strict
control over RPCs, but may cause performance degradation if used without full knowledge of
the drawbacks of specifying such a limit. By default we would probably want this limit to
be Long.Max_Value or Int.Max_Value so that the current behavior is followed.

In terms of saving the scanner position to re-open later, is the position that is saved the
row key? Does this handle the case where the raw key value limit is reached in the middle
of a row? Or is the raw key value limit instead enforced only in between rows (i.e. after
all the cells for a particular row have been retrieved then you check the limit and only continue
if not reached)? 

Looking forward to this :)

> A limit on the raw key values is needed for each next call of a scanner
> -----------------------------------------------------------------------
>                 Key: HBASE-13215
>                 URL: https://issues.apache.org/jira/browse/HBASE-13215
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>            Reporter: He Liangliang
>            Assignee: He Liangliang
> In the current scanner next, there are several limits: caching, batch and size. But there
is no limit on raw data scanned, so the time consumed by a next call is unbounded. For example,
many consecutive deleted or filtered out cells will leads to a socket timeout. This can make
user code get stuck.

This message was sent by Atlassian JIRA

View raw message