hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15398) Cells loss or disorder when using family essential filter and partial scanning protocol
Date Mon, 21 Mar 2016 06:58:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203828#comment-15203828

Phil Yang commented on HBASE-15398:

Before we support partial protocol, we already have family essential filters, they mainly
used for reduce the time of scanning if we skip the row because we won't read any data in
joinedHeap. It also works now, right? So at least we won't make anything worse.

Let me summarize what we have:

If we don't want to break the assumption that client only receives sorted cells, we have to
disable partial protocol if the order of partial results may be disorder.
Based on this:

(1)Before this issue we allow user using family essential filters and hasFilterRow return
false(it must be a user-defined filter) at the same time. But we will ban it after this issue
resolved because we can not use partial protocol here and we don't want to increase the probability
of OOM/timeout.
(2)If user using family essential filters and hasFilterRow return true(eg. SCVF), before this
issue the order of results is wrong when using partial protocol, after this issue resolved
we will disable partial protocol. So here we will increase the probability of OOM/timeout.
(3)In 1.1.0-1.1.3 and 1.2.0, we allow user using setAllowPartial(true) and family essential
filters at the same time. But the order of results is wrong. After this issue resolved we
will ban this usage because we must disable partial protocol.

Any supplement or concerns?

> Cells loss or disorder when using family essential filter and partial scanning protocol
> ---------------------------------------------------------------------------------------
>                 Key: HBASE-15398
>                 URL: https://issues.apache.org/jira/browse/HBASE-15398
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, Scanners
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>         Attachments: 15398-test.txt, HBASE-15398-v2.patch, HBASE-15398-v3.patch, HBASE-15398-v4.patch,
HBASE-15398-v5.patch, HBASE-15398.v1.txt
> In RegionScannerImpl, we have two heaps, storeHeap and joinedHeap. If we have a filter
and it doesn't apply to all cf, the stores whose families needn't be  filtered will be in
joinedHeap. We scan storeHeap first, then joinedHeap, and merge the results and sort and return
to client. We need sort because the order of Cell is rowkey/cf/cq/ts and a smaller cf may
be in the joinedHeap.
> However, after HBASE-11544 we may transfer partial results when we get SIZE_LIMIT_REACHED_MID_ROW
or other similar states. We may return a larger cf first because it is in storeHeap and then
a smaller cf because it is in joinedHeap. Server won't hold all cells in a row and client
doesn't have a sorting logic. The order of cf in Result for user is wrong.
> And a more critical bug is, if we get a LIMIT_REACHED_MID_ROW on the last cell of a row
in storeHeap, we will break scanning in RegionScannerImpl and in populateResult we will change
the state to SIZE_LIMIT_REACHED because next peeked cell is next row. But this is only the
last cell of one and we have two... And SIZE_LIMIT_REACHED means this Result is not partial
(by ScannerContext.partialResultFormed), client will see it and merge them and return to user
with losing data of joinedHeap. On next scan we will read next row of storeHeap and joinedHeap
is forgotten and never be read...

This message was sent by Atlassian JIRA

View raw message