hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7774) RegionObserver.prePut() cannot rely on the Put's timestamps, can even cause data loss
Date Tue, 05 Feb 2013 22:03:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571780#comment-13571780
] 

Jean-Daniel Cryans commented on HBASE-7774:
-------------------------------------------

Shell example for those more visual like me:

{code}
hbase(main):001:0> scan 't'
ROW                                           COLUMN+CELL                             
0 row(s) in 0.5100 seconds

hbase(main):002:0> put 't', '1', 'cf:u', 'the row we are checking in the coproc'
0 row(s) in 0.0530 seconds

hbase(main):003:0> scan 't'
ROW                                           COLUMN+CELL                                
                                                                                        
 1                                            column=cf:txt, timestamp=1360092298333, value=cell
found                                                                         
1 row(s) in 0.0190 seconds
{code}

The cell we inserted is missing from the scan.
                
> RegionObserver.prePut() cannot rely on the Put's timestamps, can even cause data loss
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7774
>                 URL: https://issues.apache.org/jira/browse/HBASE-7774
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.2, 0.96.0, 0.94.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>
> We had a user that had code that looked like this in a coprocessor's prePut():
> {code}
> if (put.has(expectedKv))
>   put.add(kvSayingIFoundIt);
> else
>   put.add(kvSayingNotFound);
> {code}
> If you have MSLAB turned *off*, and you have the {{expectedKv}} in your {{Put}}, doing
a {{Get}} following your insert will only return {{kvSayingIFoundIt}} and not the KV you were
actually inserting.
> More so, if you only do {{put.has(expectedKv)}}, you will not get anything back. Your
data seems to be gone.
> The reason is that in {{prePut()}} the timestamp hasn't been set yet, so calling {{kv.getTimestamp()}}
during the comparisons in {{put.has()}} will populate {{kv.timestampCache}} with {{Long.MAX_VALUE}}.
Then it will stay in the {{MemStore}} with that big timestamp and be filtered out because
{{TimeRange}} will compare {{Long.MAX_VALUE}} >= {{Long.MAX_VALUE}} and return {{SKIP}}.
> And the reason it works correctly with MSLAB *on* is that the KV is cloned in {{maybeCloneWithAllocator()}}
and the cache is reset.
> Now, I think this has bigger implications. Basically, you can't rely on the timestamp
at all in {{prePut()}}. I'm sure this can screw someone else in a creative way later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message