hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4536) Allow CF to retain deleted rows
Date Thu, 06 Oct 2011 21:07:29 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars Hofhansl updated HBASE-4536:

    Fix Version/s:     (was: 0.92.0)

Turns out this is a bit more complicated than I thought. There are three types of deletes:
# version deletes - effective for a specific version of a specific column
# column deletes - effective for all versions of a specific column
# family deletes - effective for all versions of all columns of a family

The first two are sorted before the puts they affect based on their resp. timestamps, but
after newer puts.
Family deletes, always sort before all versions of all columns.

The problems is deciding when the delete rows (the marker rows) themselves can be removed
during a major compaction.

For #1 and #2 I can just do version counting, and newer puts will eventually push out the
delete markers from the store.
With #3 this will never happen as they always sort before all puts of the same family, regardless
of any timestamp set on them.
Here it is necessary to scan all puts for that family and then decide whether the delete needs
to be included based on whether the delete had any affect on any of the puts in the same family.

Because of this, moving out of 0.92 as changes will be bigger. Put back if you think otherwise.

I still think that timetravel is an important feature of HBase and incomplete if it cannot
include deleted rows.

> Allow CF to retain deleted rows
> -------------------------------
>                 Key: HBASE-4536
>                 URL: https://issues.apache.org/jira/browse/HBASE-4536
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions.
> However, if a client deletes a row all version older than the delete tomb stone will
be remove at the next major compaction (and even at memstore flush - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions
even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a
user viewpoint)
> Comments? Any other options?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message