hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tobe <tobeg3oo...@gmail.com>
Subject Re: Should scan check the limitation of the number of versions?
Date Mon, 25 Aug 2014 11:32:26 GMT
I haven't read the code deeply but I have an idea(not sure whether it's
right or not). When we scan the the columns, we will skip the one which
doesn't match(deleted). Can we use a counter to record this? For each skip,
we add one until it reaches the restrictive number of versions. But we have
to consider mvcc and others, which seems more complex.

On Mon, Aug 25, 2014 at 5:54 PM, tobe <tobeg3oogle@gmail.com> wrote:

> So far, I have found two problems about this.
> Firstly, HBase-11675 <https://issues.apache.org/jira/browse/HBASE-11675>.
> It's a little tricky and rarely happens. But it asks users to be careful of
> compaction which occurs on server side. They may get different results
> before and after the major compaction.
> Secondly, if you put a value with timestamp 100, then put another value on
> the same column with timestamp 200. Here we set the number of version as 1.
> So when we get the value of this column, we will get the latest one with
> timestamp 200 and that's right. But if I get with a timerange form 0 to
> 150, I may get the first value with timestamp 100 before compaction
> happens. And after compaction happens, you will never get this value even
> you run the same command.
> It's easy to repro, follow this steps:
> hbase(main):001:0> create "table", "cf"
> hbase(main):003:0> put "table", "row1", "cf:a", "value1", 100
> hbase(main):003:0> put "table", "row1", "cf:a", "value1", 200
> hbase(main):026:0> get "table", "row1", {TIMERANGE => [0, 150]}  // before
> flush
>    row1      column=cf:a, timestamp=100, value=value1
> hbase(main):060:0> flush "table"
> hbase(main):082:0> get "table", "row1", {TIMERANGE => [0, 150]}  // after
> flush
>    0 row(s) in 0.0050 seconds
> I think the reason of that is we have three restriction to remove data:
> delete, ttl and versions. Any time we get or scan the data, we will check
> the delete mark and ttl to make sure it will not return to users. But for
> versions, we don't check this limitation. Our output relies on the
> compaction to cleanup the overdue data. Is it possible to add this
> condition within scan(get is implemented as scan)?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message