hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianwei Cui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
Date Fri, 26 Feb 2016 10:14:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168771#comment-15168771

Jianwei Cui commented on HBASE-15340:

[~anoop.hbase], thanks for your comment, I get your point:). Yes, the case you mentioned will
happen. The page https://hbase.apache.org/acid-semantics.html explains the consistency guarantee
for scan:
A scan is not a consistent view of a table. Scans do not exhibit snapshot isolation.

Rather, scans have the following properties:

1. Any row returned by the scan will be a consistent view (i.e. that version of the complete
row existed at some point in time) [1]
2. A scan will always reflect a view of the data at least as new as the beginning of the scan.
This satisfies the visibility guarantees enumerated below.
    1. For example, if client A writes data X and then communicates via a side channel to
client B, any scans started by client B will contain data at least as new as X.
    2. A scan _must_ reflect all mutations committed prior to the construction of the scanner,
and _may_ reflect some mutations committed subsequent to the construction of the scanner.
    3. Scans must include all data written prior to the scan (except in the case where data
is subsequently mutated, in which case it _may_ reflect the mutation)
It seems the consistent for scan only guarantee to read out data at least as new as the beginning
of the scan, but no guarantee to whether read out data concurrently written or written after
the beginning of the scan. 

At the end of the page:
[1] A consistent view is not guaranteed intra-row scanning -- i.e. fetching a portion of a
row in one RPC then going back to fetch another portion of the row in a subsequent RPC. Intra-row
scanning happens when you set a limit on how many values to return per Scan#next (See Scan#setBatch(int)).
It mentioned the problem of this jira that row-level consistent view is not guaranteed for
intra-row scanning, so this is a known problem?

> Partial row result of scan may return data violates the row-level transaction 
> ------------------------------------------------------------------------------
>                 Key: HBASE-15340
>                 URL: https://issues.apache.org/jira/browse/HBASE-15340
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners, Transactions/MVCC
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
> There are cases the region sever will return partial row result, such as the client set
batch for scan or configured size limit reached. In these situations, the client may return
data that violates the row-level transaction to the application. The following steps show
the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one region 'region'.

> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 'c1' and 'c2'
>     > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and scan.setCaching(1).
The client will get one column as : {column='F:c1' and value='value1'} in the first rpc call
after scanner created, and the result will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' and accepted
another mutations for the two columns 'c1' and 'c2' as:
>     > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next request and
will retry to open scanner on 'rsB'. The newly opened scanner will higher mvcc than old data
so that could read out column as : { column='F:c2' with value='value2'} and return the result
to application.
>    Therefore, the application will get data as:
> 'row'    column='F:c1'   value='value1'
> 'row'    column='F:c2',  value='value2'
>    The returned data is combined from two different mutations and violates the row-level
> {code}
> The reason is that the newly opened scanner after region moved will get a different mvcc.
I am not sure whether this result is by design for scan if partial row result is allowed.
However, such row result combined from different transactions may make the application have
unexpected state.

This message was sent by Atlassian JIRA

View raw message