hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15484) Correct the semantic of batch and partial
Date Fri, 11 Nov 2016 08:43:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656513#comment-15656513
] 

Phil Yang edited comment on HBASE-15484 at 11/11/16 8:43 AM:
-------------------------------------------------------------

For caching we had some discussion in HBASE-16987 and HBASE-16973. Using size/time limit is
more direct than setCache for users because usually they setLimit because they want to limit
size/time, and now by default we set cache to max_value.

Paging in cell level is a possible scene. It is different from "limit" which Duo mentions
because limit means we can stop and close the scanner, but batch means we should pause and
wait next call. Since we have size/time limit at server side, a large row will not result
in OOM at server even users don't setBatch. If users indeed need setBatch to limit the max
number of cells for one Result returns, I think we can keep setBatch interface but change
it to a client-only logic. In server we only consider size/time limit, and if we return more
than batch cells, we can cache the rest of them in client? By this changing, we can decrease
the number of RPC requests without OOM/Timeout risk.

[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in HBASE-16973
:) Thanks.


was (Author: yangzhe1991):
For caching we had some discussion in HBASE-16987 and HBASE-16973. Using size/time limit is
more direct than setCache for users because usually they setLimit because they want to limit
size/time, and now by default we set cache to max_value.

Paging in cell level is a possible scene. It is different from "limit" which Duo mentions
because limit means we can stop and close the scanner, but batch means we should pause and
wait next call. Since we have size/time limit at server side, a large row will not result
in OOM at server even users don't setBatch. If users indeed need setBatch to limit the max
number of cells for one Result returns, I think we can keep setBatch interface but change
it to a client-only logic. In server we only consider size/time limit, and if we return more
than batch cells, we can cache them in client? By this changing, we can decrease the number
of RPC requests without OOM/Timeout risk.

[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in HBASE-16973
:) Thanks.

> Correct the semantic of batch and partial
> -----------------------------------------
>
>                 Key: HBASE-15484
>                 URL: https://issues.apache.org/jira/browse/HBASE-15484
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, HBASE-15484-v3.patch,
HBASE-15484-v4.patch
>
>
> Follow-up to HBASE-15325, as discussed, the meaning of setBatch and setAllowPartialResults
should not be same. We should not regard setBatch as setAllowPartialResults.
> And isPartial should be define accurately.
> (Considering getBatch==MaxInt if we don't setBatch.) If result.rawcells.length<scan.getBatch
&& result is not the last part of this row, isPartial==true, otherwise isPartial ==
false. So if user don't setAllowPartialResults(true), isPartial should always be false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message