hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <stepi...@live.com>
Subject Delete all data before a given timestamp
Date Mon, 15 Jul 2013 10:36:28 GMT
Hi HBase users,

We have created a index table (say T2) of another table (say t1). The
clients who write to T1 also write a index record to T2 with the same
timestamp. There may be accumulated inconsistency as time goes by. So we
run a MR job periodically, which fully scans T1, builds a index, and
bulk-loads the result to T2.

Because the MR job may be running for a while, during the period of which,
all new data into T2 must be kept and not be overridden. So the MR creates
puts using the timestamp the job starts.

Then we want all data in T2 before a given timestamp to invisible for read
after the index builds successfully and get deleted eventually (e.g. during
major compaction). We prefer setting it explicitly than using the TTL
feature for safety, as we want only old data are deleted only when the new
data is written. Does HBase support this kind of operation for now?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message