hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Delete all data before a given timestamp
Date Mon, 15 Jul 2013 16:48:52 GMT
When you send a delete command to the server, you can specify a timestamp.
So as the result of your MR job,"just" emit this delete with the specific
timestamp to remove any previous version?

JM

2013/7/15 Chao Shi <stepinto@live.com>

> Hi HBase users,
>
> We have created a index table (say T2) of another table (say t1). The
> clients who write to T1 also write a index record to T2 with the same
> timestamp. There may be accumulated inconsistency as time goes by. So we
> run a MR job periodically, which fully scans T1, builds a index, and
> bulk-loads the result to T2.
>
> Because the MR job may be running for a while, during the period of which,
> all new data into T2 must be kept and not be overridden. So the MR creates
> puts using the timestamp the job starts.
>
> Then we want all data in T2 before a given timestamp to invisible for read
> after the index builds successfully and get deleted eventually (e.g. during
> major compaction). We prefer setting it explicitly than using the TTL
> feature for safety, as we want only old data are deleted only when the new
> data is written. Does HBase support this kind of operation for now?
>
> Thanks,
> Chao
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message