hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
Date Wed, 14 Mar 2012 06:29:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228998#comment-13228998
] 

Lars Hofhansl edited comment on HBASE-5569 at 3/14/12 6:29 AM:
---------------------------------------------------------------

Well... The whole point of the new API was to have atomic operations.
The Put and the Delete are executed atomically together and visible at the same time.
Note that the code alternates putting row and deleting row2, and then putting row2 and deleting
row. The scan than ensure that only exactly one column is visible.

In this case the scan *itself* is inconsistent. And worse, as Nicolas (N) found out is that
even testRowMutationMultiThreads fails sometimes, and that is just a single row and should
never happen.

So I am not entirely convinced the test is at fault.

For example the scenario described above:
if
{code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);
{code}
happened between 
{code}
region.mutateRowsWithLocks(mrm, rowsToLock);
{code}

and
{code}

Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));
{code}

Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC
writepoint. So the scan will now see the other row (it sees either row or row, because row
-RowA- sorts before row2 -RowB-)
This has nothing to do with race conditions between threads, but only occurs with flushes
in the test. I'll remove the forced flushes and then run the test again.

                
      was (Author: lhofhansl):
    Well... The whole point of the new API was to have atomic operations.
The Put and the Delete are executed atomically together and visible at the same time.
Note that the code alternates putting row and deleting row2, and then putting row2 and deleting
row. The scan than ensure that only exactly one column is visible.

In this case the scan *itself* is inconsistent. And worse, as Nicolas (N) found out is that
even testRowMutationMultiThreads fails sometimes, and that is just a single row and should
never happen.

So I am not entirely convinced the test is at fault.

For example the scenario described above if Between the time thread1 execute
if
{code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);
{code}
happened between 
{code}
region.mutateRowsWithLocks(mrm, rowsToLock);
{code}

and
{code}

Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));
{code}

Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC
writepoint. So the scan will now see the other row.
This has nothing to do with race conditions between threads, but only occurs with flushes
in the test. I'll remove the forced flushes and then run the test again.
                  
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek()
is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread
see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency
issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message