hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9797) Multi row transactions are not atomic for scanners
Date Thu, 17 Oct 2013 23:20:42 GMT
Enis Soztutar created HBASE-9797:

             Summary: Multi row transactions are not atomic for scanners
                 Key: HBASE-9797
                 URL: https://issues.apache.org/jira/browse/HBASE-9797
             Project: HBase
          Issue Type: Bug
            Reporter: Enis Soztutar

Multi row atomic puts, as implemented by the coprocessor API is atomic for gets and multi
gets, but not so much for scanners. 

mvcc read point, as of today, is only kept in RS memory. When a client starts the scan, we
create a new scanner object and save the mvcc read point of the scan there. Since the scan
API is row-based, the scan results are only made visible to clients row-per-row, and the client
scanner keep track of the last row seen. 

So, for a multi-row atomic update, the scanner might get an mvcc number which is less than
the commit point of the multi-row update, so it will skip some rows in the scan (will not
see the rows). However, in case of RS failover, a new scanner will be created which will have
a mvcc read number larger than the multi-row update commit number. So the scanner will see
the remaining rows from the transaction. 

multi put : { {row1, c1, v1}, {row100, c1, v100} } mvcc write number = 2
scan : scan from row1 to row100  mvcc read number = 1

scanner will not see row1. If RS fails before scanner reaches row100, the new scanner will
get mvcc read number > 2, so it will see row100. 

There might be a couple of ways to fix this. First approach (as suggested by Sergey) is that
we can wrap the Scanner into an atomic scanner implementation, which will restart the scan
in case of a socket timeout or server failure, etc. This will batch up the results so that
the rows are not visible. For small scans (like meta) this might be viable. 

The second way to properly fix this is, first finish up the patch at HBASE-8763, then change
the scanner to obtain an mvcc number from the RS in scanner open, and save the mvcc number
in the client side. Upon failure, the scanner will continue the scan where it is left. We
have to keep the low watermark (the smallest mvcc read number of the scanners currently open)
differently. Currently that number is already tracked, but not across RS failover. We can
do timeouts to manage the low watermark I think. 
This approach also enables us to implement cell-based streaming scan instead of row-based
approach we have today. 

Opened the issue, so that it is tracked. Feel free to pick it up if you like. 

This message was sent by Atlassian JIRA

View raw message