hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Nondeterministic outcome based on cell TTL and major compaction event order
Date Mon, 20 Apr 2015 03:10:51 GMT
Interested example for cell level TTL Michael.
But one thing I want to say.  In the above example, the versions for the
corresponding CF should have been >1.    In such case there wont be issue
with major compaction right?
When versions =1 yes, it will  give non deterministic results.

-Anoop-


On Sun, Apr 19, 2015 at 6:59 PM, Michael Segel <michael_segel@hotmail.com>
wrote:

> Actually I just thought of a better example…
>
> Credit Card Fraud detection.
> Imagine you’re being sent to work on a project out of the country.
> So suppose I head over across the pond and invaded Europe. ;-P
>
> I would want the credit card companies to not weigh a foreign transaction
> heavily when determining fraud, so that if they know my location is in
> London, then spending $$ on a dinner in London is not fraud.
>
> So I call ahead and tell my bank I’m going to be in Europe for XXX months..
>
>
> >
> > As to why you would want to TTL on a column that doesn’t always use a
> TTL?
> >
> > I used this example in a different post…
> >
> > Imagine you have a road link which has an attribute of speed.
> >
> > You could have construction, or variable speed limits.
> > So you would want to change the speed limit with a TTL.
> >
> > Or you’re a retailer and you’re offering a 20% discount on a product for
> a limited time only?
> >
> > Sure, these are bad examples because in reality the database is a sync
> and the application would manage these type of issues.
> >
> >
> >> On Apr 18, 2015, at 12:23 AM, lars hofhansl <larsh@apache.org> wrote:
> >>
> >> The formatting did not come out right. Lemme try again...
> >>
> >>
> >> Just came here to say that. From our (maybe not clearly enough) defined
> semantics this how it should behave.
> >>
> >> It _is_ confusing, though, since compactions are - in a sense - just
> optimizations that run in the background to prevent the number of HFiles to
> be unbounded.
> >> In this case the schedule of the compactions influences the outcome.
> >>
> >> Note that even tombstone markers can be confusing. Here's another
> confusing example:
> >> 1. delete (r1, f1, q1, T2)
> >> 2. put (r1, f1, q1, v1, T1)
> >>
> >> If a compaction happens after #1 but before #2 the put will remain:
> >> delete
> >> compaction
> >> put (remains visible)
> >>
> >> If the compaction happens after #2 the put will be affected by the
> delete and hence removed:
> >> delete
> >> put
> >> compaction (will remove the put)
> >>
> >> Notice though that both of these examples _are_ a bit weird.
> >> Why would only a newer version of the cell have a TTL?
> >> Why would you date a delete into the future?
> >>
> >> -- Lars
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: lars hofhansl <larsh@apache.org>
> >> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> >> Sent: Friday, April 17, 2015 10:18 PM
> >> Subject: Re: Nondeterministic outcome based on cell TTL and major
> compaction event order
> >>
> >>
> >> Just came here to say that. From our (maybe not clearly enough) defined
> semantics this how it should behave.
> >>
> >> It _is_ confusing, though, since compactions are - in a sense - just
> optimizations that run in the background to prevent the number of HFiles to
> be unbounded.In this case the schedule of the compactions influences the
> outcome.
> >> Note that even tombstone markers can be confusing. Here's another
> confusing example:1. delete (r1, f1, q1, T2)2. put (r1, f1, q1, v1, T1)
> >> If a compaction happens after #1 but before #2 the put will
> remain:deletecompactionput (remains visible)
> >>
> >> If the compaction happens after #2 the put will be affected by the
> delete and hence removed.deleteputcompaction (will remove the put)
> >>
> >> Notice though that both of these examples _are_ a bit weird.Why would
> only a newer version of the cell have a TTL?Why would you date a delete
> into the future?
> >> -- Lars
> >>
> >>     From: Sean Busbey <busbey@cloudera.com>
> >>
> >>
> >>
> >> To: dev <dev@hbase.apache.org>
> >> Sent: Friday, April 17, 2015 4:52 PM
> >> Subject: Re: Nondeterministic outcome based on cell TTL and major
> compaction event order
> >>
> >> If you have max versions set to 1 (the default), then c1 should be
> removed
> >> at compaction time if c2 still exists then.
> >>
> >> --
> >> Sean
> >>
> >>
> >> On Apr 17, 2015 6:41 PM, "Michael Segel" <michael_segel@hotmail.com>
> wrote:
> >>
> >>> Ok,
> >>> So then if you have a previous cell (c1) and you insert a new cell c2
> that
> >>> has a TTL of lets say 5 mins, then c1 should always exist?
> >>> That is my understanding but from Cosmin’s post, he’s saying its
> >>> different.  And that’s why I don’t understand.  You couldn’t lose
the
> cell
> >>> c1 at all.
> >>> Compaction or no compaction.
> >>>
> >>> That’s why I’m confused.  Current behavior doesn’t match the expected
> >>> contract.
> >>>
> >>> -Mike
> >>>
> >>>> On Apr 17, 2015, at 4:37 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
> >>>>
> >>>> The way TTLs work today is they define the interval of time a cell
> >>>> exists - exactly as that. There is no tombstone laid like a normal
> >>>> delete. Once the TTL elapses the cell just ceases to exist to normal
> >>>> scanners. The interaction of expired cells, multiple versions, minimum
> >>>> versions, raw scanners, etc. can be confusing. We can absolutely
> >>>> revisit this.
> >>>>
> >>>> A cell with an expired TTL could be treated as the combination of
> >>>> tombstone and the most recent value it lays over. This is not how the
> >>>> implementation works today, but could be changed for an upcoming major
> >>>> version like 2.0 if there's consensus to do it.
> >>>>
> >>>>
> >>>>> On Apr 10, 2015, at 7:26 AM, Cosmin Lehene <clehene@adobe.com>
> wrote:
> >>>>>
> >>>>> I've been initially puzzled by this, although I realize how it's
> likely
> >>> as designed.
> >>>>>
> >>>>>
> >>>>> The cell TTL expiration and compactions events can lead to either
> some
> >>> (the older) data left or no data at all for a particular  (row, family,
> >>> qualifier, ts) coordinate.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Write (r1, f1, q1, v1, 1)
> >>>>>
> >>>>> Write (r1, f1, q1, v1, 2) - TTL=1 minute
> >>>>>
> >>>>>
> >>>>> Scenario 1:
> >>>>>
> >>>>>
> >>>>> If a major compaction happens within a minute
> >>>>>
> >>>>>
> >>>>> it will remove (r1, f1, q1, v1, 1)
> >>>>>
> >>>>> then after a minute (r1, f1, q1, v1, 2) will expire
> >>>>>
> >>>>> no data left
> >>>>>
> >>>>>
> >>>>> Scenario 2:
> >>>>>
> >>>>>
> >>>>> A minute passes
> >>>>>
> >>>>> (r1, f1, q1, v1, 2) expires
> >>>>>
> >>>>> Compaction runs..
> >>>>>
> >>>>> (r1, f1, q1, v1, 1) remains
> >>>>>
> >>>>>
> >>>>>
> >>>>> This seems, by and large expected behavior, but it still seems
> >>> "uncomfortable" that the (overall) outcome is not decided by me, but
> by a
> >>> chance of event ordering.
> >>>>>
> >>>>>
> >>>>> I wonder we'd want this to behave differently (perhaps it has been
> >>> discussed already), but if not, it's worth a more detailed
> documentation in
> >>> the book.
> >>>>>
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>>
> >>>>> Cosmin
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>>
> >>>> - Andy
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>>> Hein (via Tom White)
> >>>>
> >>>
> >>> The opinions expressed here are mine, while they may reflect a
> cognitive
> >>> thought, that is purely accidental.
> >>> Use at your own risk.
> >>> Michael Segel
> >>> michael_segel (AT) hotmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message