hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9905) Enable using seqId as timestamp
Date Wed, 21 May 2014 00:30:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004155#comment-14004155

Andrew Purtell commented on HBASE-9905:

bq. What about features like TTL that based on wall clock time? Are we going to unsupport
them (unless the client sets the TSs)?

I think auto expiration and background garbage collection of data is still a valid use case
for HBase. We would need an option that selects timestamps set by server time. If this option
is active we'd filter expired cells based on timestamp and TTL settings as today. If the user
has chosen some other type of timestamp scheme, it's ok for TTL settings to be ignored or

bq. Or maybe we can have a more coarse grained TTL where for every hfile from flush we keep
the timestamp, and delete hfiles in compaction once expired.

The trouble with this is it could be too coarse grained. If not taking writes and with major
compaction disabled a region could carry those 'expired' HFiles for a long time.

> Enable using seqId as timestamp 
> --------------------------------
>                 Key: HBASE-9905
>                 URL: https://issues.apache.org/jira/browse/HBASE-9905
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
> This has been discussed previously, and Lars H. was mentioning an idea from the client
to declare whether timestamps are used or not explicitly. 
> The problem is that, for data models not using timestamps, we are still relying on clocks
to order the updates. Clock skew, same milisecond puts after deletes, etc can cause unexpected
behavior and data not being visible.  
> We should have a table descriptor / family property, which would declare that the data
model does not use timestamps. Then we can populate this dimension with the seqId, so that
global ordering of edits are not effected by wall clock. 
> For example, META will use this. 
> Once we have something like this, we can think of making it default for new tables, so
that the unknowing user will not shoot herself in the foot. 

This message was sent by Atlassian JIRA

View raw message