hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
Date Mon, 10 Oct 2011 18:30:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124372#comment-13124372

Lars Hofhansl commented on HBASE-4536:

This issue is pretty resistant to all my attempts to solve it. :(

Was toying with keeping track of the first key written to a file (we currently keep track
of the last key).
However, since the family delete hides all puts with a TS less than *or equal to* the delete
TS, this won't help as the delete marker itself eventually would be the first key and it still
might affect puts later in the file.

Also KVs are sorted in reverse time order (except family delete markers), so I cannot just
look at the put directly following the delete marker, because it'll be the newest put rather
then the oldest.

So there are three options I think:
1. Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted
rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS
would still be enforced on all rows types except for family delete markers.
2. Translate family delete markers to column delete marker at (major) compaction time.
3. Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the
file metadata. Use that use expire delete marker that are older and hence can't affect any
puts in the file.

None of these are particularly attractive. #1 is limiting, #2 might get expensive if there're
many columns, #3 would require the FileWriters to understand KVs.

> Allow CF to retain deleted rows
> -------------------------------
>                 Key: HBASE-4536
>                 URL: https://issues.apache.org/jira/browse/HBASE-4536
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions.
> However, if a client deletes a row all version older than the delete tomb stone will
be remove at the next major compaction (and even at memstore flush - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions
even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a
user viewpoint)
> Comments? Any other options?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message