hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Delete all data before a given timestamp
Date Tue, 16 Jul 2013 03:52:01 GMT
You might be interested in HBASE-8784 (https://issues.apache.org/jira/browse/HBASE-8784).

----- Original Message -----
From: Chao Shi <stepinto@live.com>
To: user@hbase.apache.org
Sent: Monday, July 15, 2013 8:07 PM
Subject: Re: Delete all data before a given timestamp

Jean-Marc Spaggiari <jean-marc@...> writes:

> When you send a delete command to the server, you can specify a timestamp.
> So as the result of your MR job,"just" emit this delete with the specific
> timestamp to remove any previous version?
> JM
> 2013/7/15 Chao Shi <stepinto@...>
> > Hi HBase users,
> >
> > We have created a index table (say T2) of another table (say t1). The
> > clients who write to T1 also write a index record to T2 with the same
> > timestamp. There may be accumulated inconsistency as time goes by. So we
> > run a MR job periodically, which fully scans T1, builds a index, and
> > bulk-loads the result to T2.
> >
> > Because the MR job may be running for a while, during the period of 
> > all new data into T2 must be kept and not be overridden. So the MR 
> > puts using the timestamp the job starts.
> >
> > Then we want all data in T2 before a given timestamp to invisible for 
> > after the index builds successfully and get deleted eventually (e.g. 
> > major compaction). We prefer setting it explicitly than using the TTL
> > feature for safety, as we want only old data are deleted only when the 
> > data is written. Does HBase support this kind of operation for now?
> >
> > Thanks,
> > Chao
> >

Hi Jean-Marc,

Thanks for the reply.

I see delete can specify a timestamp, but I don't think that is what I need. 
To clarify, in my scenario, I don't want to issue deletes for every key 
(because I don't know what exactly to delete unless do another full scan).

I'd like to see if this is possible: set a min_timestamp to 
ColumnDescriptor. Once done, KVs before this timestamp become invisible to 
read. During major compaction, these KVs are deleted. It is the absolute 
version of TTL.

View raw message