hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cosmin Lehene <cleh...@adobe.com>
Subject Re: Nondeterministic outcome based on cell TTL and major compaction event order
Date Mon, 13 Apr 2015 23:12:51 GMT
The ambiguity seems to lie at the intersection of TTL and version "garbage collection" during
compactions. 

Major compactions can lead to nondeterministic results when multiple versions are involved
(slightly captured in the book http://hbase.apache.org/book.html#versions and http://www.ngdata.com/bending-time-in-hbase/
)
TTL expirations don't result in deletes (at least not in the classical sense with a tombstone).

Cosmin 

_______________________________________
From: Michael Segel <michael_segel@hotmail.com>
Sent: Friday, April 10, 2015 8:35 AM
To: dev@hbase.apache.org
Subject: Re: Nondeterministic outcome based on cell TTL and major compaction event order

Interesting.
There seems to be some ambiguity in what happens between a TTL and a deletion.

Is the TTL a delete or is it a separate type of function?

That is to say when you inserted version 2 of the cell, did you intend to just have version
2 exist for a little while and then default to version 1 or did you mean that when you inserted
version 2, you wanted to delete everything prior to version 2 and then when version 2 expires,
it then goes away?

The documentation isn’t clear on this point.

To give you an example where you wouldn’t want to have the TTL on a cell also delete prior
versions…

Suppose you’re storing map data in HBase. You have an attribute (speed) associated to a
road link.

If the road is a 65 MPH highway, then the base speed (default speed) is 65MPH. However if
there’s construction planned for the road then you need to reset the speed to 45 mph while
there is construction.  You know that the construction is supposed to last X months, so you
reset the speed limit to 45 with a TTL on that cell version only.

Another example is if you’re storing price for a given sku in a given region of your retail
chain.  So you want to reduce the price by 20% for a 2 week period.
Again, you set that discount to live for 2 weeks with a TTL, then revert back to original
price.

So I guess there should be a clarification as to what is intended for the TTL to do?

Does that make sense?





> On Apr 10, 2015, at 9:26 AM, Cosmin Lehene <clehene@adobe.com> wrote:
>
> I've been initially puzzled by this, although I realize how it's likely as designed.
>
>
> The cell TTL expiration and compactions events can lead to either some (the older) data
left or no data at all for a particular  (row, family, qualifier, ts) coordinate.
>
>
>
> Write (r1, f1, q1, v1, 1)
>
> Write (r1, f1, q1, v1, 2) - TTL=1 minute
>
>
> Scenario 1:
>
>
> If a major compaction happens within a minute
>
>
> it will remove (r1, f1, q1, v1, 1)
>
> then after a minute (r1, f1, q1, v1, 2) will expire
>
> no data left
>
>
> Scenario 2:
>
>
> A minute passes
>
> (r1, f1, q1, v1, 2) expires
>
> Compaction runs..
>
> (r1, f1, q1, v1, 1) remains
>
>
>
> This seems, by and large expected behavior, but it still seems "uncomfortable" that the
(overall) outcome is not decided by me, but by a chance of event ordering.
>
>
> I wonder we'd want this to behave differently (perhaps it has been discussed already),
but if not, it's worth a more detailed documentation in the book.
>
>
> What do you think?
>
>
> Cosmin
>
>
>
>

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message