hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tobe <tobeg3oo...@gmail.com>
Subject Should scan check the limitation of the number of versions?
Date Mon, 25 Aug 2014 09:54:25 GMT
So far, I have found two problems about this.

Firstly, HBase-11675 <https://issues.apache.org/jira/browse/HBASE-11675>.
It's a little tricky and rarely happens. But it asks users to be careful of
compaction which occurs on server side. They may get different results
before and after the major compaction.

Secondly, if you put a value with timestamp 100, then put another value on
the same column with timestamp 200. Here we set the number of version as 1.
So when we get the value of this column, we will get the latest one with
timestamp 200 and that's right. But if I get with a timerange form 0 to
150, I may get the first value with timestamp 100 before compaction
happens. And after compaction happens, you will never get this value even
you run the same command.

It's easy to repro, follow this steps:
hbase(main):001:0> create "table", "cf"
hbase(main):003:0> put "table", "row1", "cf:a", "value1", 100
hbase(main):003:0> put "table", "row1", "cf:a", "value1", 200
hbase(main):026:0> get "table", "row1", {TIMERANGE => [0, 150]}  // before
   row1      column=cf:a, timestamp=100, value=value1
hbase(main):060:0> flush "table"
hbase(main):082:0> get "table", "row1", {TIMERANGE => [0, 150]}  // after
   0 row(s) in 0.0050 seconds

I think the reason of that is we have three restriction to remove data:
delete, ttl and versions. Any time we get or scan the data, we will check
the delete mark and ttl to make sure it will not return to users. But for
versions, we don't check this limitation. Our output relies on the
compaction to cleanup the overdue data. Is it possible to add this
condition within scan(get is implemented as scan)?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message