hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathias Herberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15487) Deletions done via BulkDeleteEndpoint make past data re-appear
Date Sat, 19 Mar 2016 12:47:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202745#comment-15202745
] 

Mathias Herberts commented on HBASE-15487:
------------------------------------------

>From thinking more about it I guess the problem is probably related more to the way 'VERSIONS'
is enforced than to the delete operation itself.

By setting 'VERSIONS' at table creation time it is expected that only that many versions of
a cell will be retained. We assume 'VERSIONS' was set to 1 for the purpose of the present
explanation.

The 'VERSIONS' parameter seems to be enforced during a Scan, even if the data being scanned
is still in the memstore since a Scan done with a requested number of versions >  'VERSIONS'
won't return more than 'VERSIONS' versions of the cell.

When issueing a Delete against a cell which was written more than 'VERSIONS' time, one would
expect that the deletion removes all versions of the cell since no versions past the last
one will be retained at compaction time.

But it seems that only the last version is deleted and the one prior to that then becomes
visible again when it was not visible before (Scan with setMaxVersions() won't return it).


> Deletions done via BulkDeleteEndpoint make past data re-appear
> --------------------------------------------------------------
>
>                 Key: HBASE-15487
>                 URL: https://issues.apache.org/jira/browse/HBASE-15487
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.0.3
>            Reporter: Mathias Herberts
>         Attachments: HBaseTest.java, HBaseTest.java
>
>
> The Warp10 (www.warp10.io) time series database uses HBase as its underlying data store.
The deletion of ranges of cells is performed using the BulkDeleteEndpoint.
> In the following scenario the deletion does not appear to be working properly:
> The table 't' is created with a single version using:
> create 't', {NAME => 'v', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS=> '1', MIN_VERSIONS => '0', TTL =>
'2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>'false',
BLOCKCACHE => 'true'}
> We write a cell at row '0x00', colfam 'v', colq '', value 0x0
> We write the same cell again with value 0x1
> A scan will return a single value 0x1
> We then perform a delete using the BulkDeleteEndpoint and a Scan with a DeleteType of
'VERSION'
> The reported number of deleted versions is 1 (which is coherent given the table was created
with MAX_VERSIONS=1)
> The same scan as the one performed before the delete returns a single value 0x0.
> This seems to happen when all operations are performed against the memstore.
> A regular delete will remove the cell and a later scan won't show it.
> I'll attach a test which demonstrates the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message