hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tobe <tobeg3oo...@gmail.com>
Subject Re: Should scan check the limitation of the number of versions?
Date Tue, 26 Aug 2014 12:03:07 GMT
Sorry for ignoring @nicolas's question. In my opinion, we should not see
this version because it's the overdue data when we set {VERSIONS=>1}.
Actually, we can't see it once the compaction occurs, right? So I prefer to
a consistent behaviour no matter what happens to the server.


On Tue, Aug 26, 2014 at 7:56 PM, tobe <tobeg3oogle@gmail.com> wrote:

> Thanks @nicolas, @andrew and @lars. The problem like this always comes
> down to "by design". It depends on the semantic that HBase provides.
>
> As a user, I don't expect different results when I sent the same request
> at the same time. I don't care about how HBase operates and I think the
> process is determinate and predicable. So if you have to say it depends on
> whether the server runs compactions or not, I prefer to a more determinate
> semantic.
>
>
> On Tue, Aug 26, 2014 at 4:54 PM, Nicolas Liochon <nkeywal@gmail.com>
> wrote:
>
>> (moving to user)
>>
>> In your first scenario (put "table", "row1", "cf:a", "value1", 100 then
>> put
>> "table", "row1", "cf:a", "value1", 200), there is no deletion, so the
>> setting KEEP_DELETED_CELLS is not used at all
>> The behavior you describe is "as expected": there are two versions until
>> the compaction occurs and removes the version not needed, depending on the
>> configuration.
>> There are some optimizations around this: we skip reading early if the
>> timestamps of what we're reading is not in the scan range. So we don't
>> know
>> if there is a newer value.
>>
>> What's the use case you're looking at?
>>
>> Nicolas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 26, 2014 at 3:36 AM, tobe <tobeg3oogle@gmail.com> wrote:
>>
>> > @andrew Actually I don't want to see row in TIMERANGE => [0, 150]
>> because
>> > it's the overdue version. Should I set {KEEP_DELETED_CELLS => 'true'}?
>> My
>> > problem is that even though I don't keep deleted cells, I will get the
>> > result which is not what I expect.
>> >
>> >
>> > On Tue, Aug 26, 2014 at 9:24 AM, Andrew Purtell <apurtell@apache.org>
>> > wrote:
>> >
>> > > On Mon, Aug 25, 2014 at 6:13 PM, tobe <tobeg3oogle@gmail.com> wrote:
>> > >
>> > > > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table.
I
>> will
>> > > get
>> > > > the same result before manually running `flush`. You can try the
>> > > commands I
>> > > > gave and it's 100% repro.
>> > > >
>> > >
>> > > ​You need KEEP_DELETED_CELLS => 'true'. ​
>> > >
>> > >
>> > >
>> > > On Mon, Aug 25, 2014 at 6:13 PM, tobe <tobeg3oogle@gmail.com> wrote:
>> > >
>> > > > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table.
I
>> will
>> > > get
>> > > > the same result before manually running `flush`. You can try the
>> > > commands I
>> > > > gave and it's 100% repro.
>> > > >
>> > > >
>> > > > On Tue, Aug 26, 2014 at 2:20 AM, lars hofhansl <larsh@apache.org>
>> > wrote:
>> > > >
>> > > > > Queries of past time ranges only work correctly when
>> > KEEP_DELETED_CELLS
>> > > > is
>> > > > > enabled for the column families.
>> > > > >
>> > > > >
>> > > > > ________________________________
>> > > > >  From: tobe <tobeg3oogle@gmail.com>
>> > > > > To: hbase-dev <dev@hbase.apache.org>
>> > > > > Cc: "user@hbase.apache.org" <user@hbase.apache.org>
>> > > > > Sent: Monday, August 25, 2014 4:32 AM
>> > > > > Subject: Re: Should scan check the limitation of the number of
>> > > versions?
>> > > > >
>> > > > >
>> > > > > I haven't read the code deeply but I have an idea(not sure whether
>> > it's
>> > > > > right or not). When we scan the the columns, we will skip the
one
>> > which
>> > > > > doesn't match(deleted). Can we use a counter to record this?
For
>> each
>> > > > skip,
>> > > > > we add one until it reaches the restrictive number of versions.
>> But
>> > we
>> > > > have
>> > > > > to consider mvcc and others, which seems more complex.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Aug 25, 2014 at 5:54 PM, tobe <tobeg3oogle@gmail.com>
>> wrote:
>> > > > >
>> > > > > > So far, I have found two problems about this.
>> > > > > >
>> > > > > > Firstly, HBase-11675 <
>> > > > https://issues.apache.org/jira/browse/HBASE-11675
>> > > > > >.
>> > > > > > It's a little tricky and rarely happens. But it asks users
to be
>> > > > careful
>> > > > > of
>> > > > > > compaction which occurs on server side. They may get different
>> > > results
>> > > > > > before and after the major compaction.
>> > > > > >
>> > > > > > Secondly, if you put a value with timestamp 100, then put
>> another
>> > > value
>> > > > > on
>> > > > > > the same column with timestamp 200. Here we set the number
of
>> > version
>> > > > as
>> > > > > 1.
>> > > > > > So when we get the value of this column, we will get the
latest
>> one
>> > > > with
>> > > > > > timestamp 200 and that's right. But if I get with a timerange
>> form
>> > 0
>> > > to
>> > > > > > 150, I may get the first value with timestamp 100 before
>> compaction
>> > > > > > happens. And after compaction happens, you will never get
this
>> > value
>> > > > even
>> > > > > > you run the same command.
>> > > > > >
>> > > > > > It's easy to repro, follow this steps:
>> > > > > > hbase(main):001:0> create "table", "cf"
>> > > > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1",
100
>> > > > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1",
200
>> > > > > > hbase(main):026:0> get "table", "row1", {TIMERANGE =>
[0,
>> 150]}  //
>> > > > > before
>> > > > > > flush
>> > > > > >    row1      column=cf:a, timestamp=100, value=value1
>> > > > > > hbase(main):060:0> flush "table"
>> > > > > > hbase(main):082:0> get "table", "row1", {TIMERANGE =>
[0,
>> 150]}  //
>> > > > after
>> > > > > > flush
>> > > > > >    0 row(s) in 0.0050 seconds
>> > > > > >
>> > > > > > I think the reason of that is we have three restriction
to
>> remove
>> > > data:
>> > > > > > delete, ttl and versions. Any time we get or scan the data,
we
>> will
>> > > > check
>> > > > > > the delete mark and ttl to make sure it will not return
to
>> users.
>> > But
>> > > > for
>> > > > > > versions, we don't check this limitation. Our output relies
on
>> the
>> > > > > > compaction to cleanup the overdue data. Is it possible to
add
>> this
>> > > > > > condition within scan(get is implemented as scan)?
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> > > (via Tom White)
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message