hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: HBase returns old values even with max versions = 1
Date Sun, 08 Dec 2013 09:01:03 GMT
Thanks for clarifying this,
I know now why my code didn't work as expected.

For now I think that creating a simple custom Filter for my situation is
the most efficient workaround.

Niels Basjes


On Sat, Dec 7, 2013 at 3:26 AM, lars hofhansl <larsh@apache.org> wrote:

> Filed https://issues.apache.org/jira/browse/HBASE-10102
>
>
>
> ________________________________
>  From: lars hofhansl <larsh@apache.org>
> To: "user@hbase.apache.org" <user@hbase.apache.org>; hbase-dev <
> dev@hbase.apache.org>
> Sent: Friday, December 6, 2013 5:31 PM
> Subject: Re: HBase returns old values even with max versions = 1
>
>
> + dev list
>
> Specifically:
>
> Currently the workflow in ScanQueryMatcher is something like this:
>
> 1. <versions> = min(<CF versions>, <scan version>)
> 2. filter by timerange
> 3. filter out columns (i.e. columns not specified in the scan)
> 4. apply customer filters
> 5. filter by <versions>
>
> Every KV is passed through this filtering process.
>
> What we should do is this:
>
> 1. filter by <CF versions>
> 2. filter by timerange
> 3. filter out columns (i.e. columns not specified in the scan)
> 4. apply customer filters
> 5. filter by <scan versions>
>
> The trick will be doing that efficiently.
>
> -- Lars
>
>
>
> ________________________________
>
> From: lars hofhansl <larsh@apache.org>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Friday, December 6, 2013 5:10 PM
> Subject: Re: HBase returns old values even with max versions = 1
>
>
> The old versions can still be around until a flush and/or compaction.
>
> During a user-level scan, HBase first filters by timerange and then counts
> the versions.
> I agree, this is counter intuitive in this case. In other cases people
> want to first limit by timerange, and then get x numbers of versions back.
> We might need to start to distinguish between the number of version
> configured for the column family and the number of versions configured for
> the scan.
>
> Mind filing a jira? Can discuss solutions there.
>
> Thanks.
>
> -- Lars
>
>
>
> ________________________________
>
> From: Niels Basjes <Niels@basjes.nl>
> To: user <user@hbase.apache.org>
> Sent: Friday, December 6, 2013 8:05 AM
> Subject: HBase returns old values even with max versions = 1
>
>
> Hi,
>
> I have the desire to find the columns that have not been updated for more
> than a specific time period.
>
> So I want to do a scan against the columns with a timerange.
> The normal behavior of HBase is that you then get the latest value in that
> time range (which is not what I want).
>
> As far as I understand the way HBase should work is that if you set the
> maximum number of versions for the values in a column family to '1' it
> should retain only the last value that was put into the cell.
>
> What I found is different.
>
> If I do the following commands into the hbase shell
>
>     create 't1', {NAME => 'c1', VERSIONS => 1}
>     put 't1', 'r1', 'c1', 'One', 1000
>     put 't1', 'r1', 'c1', 'Two', 2000
>     put 't1', 'r1', 'c1', 'Three', 3000
>     get 't1', 'r1'
>     get 't1', 'r1' , {TIMERANGE => [0,1500]}
>
> the result is this:
>
>     get 't1', 'r1'
>     COLUMN                     CELL
>      c1:                       timestamp=3000, value=Three
>     1 row(s) in 0.0780 seconds
>
>     get 't1', 'r1' , {TIMERANGE => [0,1500]}
>     COLUMN                     CELL
>      c1:                       timestamp=1000, value=One
>     1 row(s) in 0.1390 seconds
>
> Why does the second query return a value even though I've set the max
> versions to only 1?
> I expect that it only 'knows' about the latest value ('Three') and thus
> should return an empty result in the above example.
> What is the correct way to obtain what I'm looking for?
>
> My current workaround is that I simply retrieve the latest value for all my
> columns and filter them in my application code.
>
> The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message