hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2406) Define semantics of cell timestamps/versions
Date Mon, 19 Jul 2010 17:25:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889926#action_12889926

Jonathan Gray commented on HBASE-2406:

bq. Gets lack "...the ability to retrieve the latest version less than or equal to a given
timestamp, thus giving the 'latest' state of the record at a certain point in time."
I commented on this on the blog post.  This is not the case, we do support this by setting
max to be the timestamp+1

bq. Major compactions are not invisible to the user
This is hard to fix and it's not clear what "expected" behavior should be.  Do you ever re-surface
a Put once it's been hidden?  Seems like there's an argument on both sides of this.  If I
want to keep the latest two versions, I might have accidentally inserted a bad version, so
I want to delete it and resurface an older one.  But maybe someone else has an argument that
they never want something to be able to re-appear after being shadowed?

I think the most important fix is to handle duplicate versions (ordered by insertion time,
using memstoreTS and storefile stamps).

Other stuff is less clear what the "right" answer should be.  I also don't think we can attempt
to completely nail-down this stuff until we make a strong determination about what should/should
not be processed during minor compactions.  I did some preliminary benchmarking work on minor
compactions a couple months back, hoping to have an intern pick that work back up so we can
make a decision here.

> Define semantics of cell timestamps/versions
> --------------------------------------------
>                 Key: HBASE-2406
>                 URL: https://issues.apache.org/jira/browse/HBASE-2406
>             Project: HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.0
> There is a lot of general confusion over the semantics of the cell timestamp. In particular,
a couple questions that often come up:
> - If multiple writes to a cell have the same timestamp, are all versions maintained or
just the last?
> - Is it OK to write cells in a non-increasing timestamp order?
> Let's discuss, figure out what semantics make sense, and then move towards (a) documentation,
(b) unit tests that prove we have those semantics.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message