hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9797) Multi row transactions are not atomic for scanners
Date Thu, 17 Oct 2013 23:22:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798587#comment-13798587

Sergey Shelukhin commented on HBASE-9797:

The first approach will only work for small scans, period...

> Multi row transactions are not atomic for scanners
> --------------------------------------------------
>                 Key: HBASE-9797
>                 URL: https://issues.apache.org/jira/browse/HBASE-9797
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
> Multi row atomic puts, as implemented by the coprocessor API is atomic for gets and multi
gets, but not so much for scanners. 
> mvcc read point, as of today, is only kept in RS memory. When a client starts the scan,
we create a new scanner object and save the mvcc read point of the scan there. Since the scan
API is row-based, the scan results are only made visible to clients row-per-row, and the client
scanner keep track of the last row seen. 
> So, for a multi-row atomic update, the scanner might get an mvcc number which is less
than the commit point of the multi-row update, so it will skip some rows in the scan (will
not see the rows). However, in case of RS failover, a new scanner will be created which will
have a mvcc read number larger than the multi-row update commit number. So the scanner will
see the remaining rows from the transaction. 
> Example: 
> {code}
> multi put : { {row1, c1, v1}, {row100, c1, v100} } mvcc write number = 2
> scan : scan from row1 to row100  mvcc read number = 1
> {code}
> scanner will not see row1. If RS fails before scanner reaches row100, the new scanner
will get mvcc read number > 2, so it will see row100. 
> There might be a couple of ways to fix this. First approach (as suggested by Sergey)
is that we can wrap the Scanner into an atomic scanner implementation, which will restart
the scan in case of a socket timeout or server failure, etc. This will batch up the results
so that the rows are not visible. For small scans (like meta) this might be viable. 
> The second way to properly fix this is, first finish up the patch at HBASE-8763, then
change the scanner to obtain an mvcc number from the RS in scanner open, and save the mvcc
number in the client side. Upon failure, the scanner will continue the scan where it is left.
We have to keep the low watermark (the smallest mvcc read number of the scanners currently
open) differently. Currently that number is already tracked, but not across RS failover. We
can do timeouts to manage the low watermark I think. 
> This approach also enables us to implement cell-based streaming scan instead of row-based
approach we have today. 
> Opened the issue, so that it is tracked. Feel free to pick it up if you like. 

This message was sent by Atlassian JIRA

View raw message