hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Smith <stu24m...@yahoo.com>
Subject Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?
Date Wed, 14 Dec 2011 23:37:12 GMT
Ah. Thanks for clarifying my wrong answer.. !

The only time I had to deal with timestamps I had to go through the thrift API ...
Never noticed the setTimeRange in the Scan() java API :)

So now I'm curious.. If I use this and it can't skip HFiles.. is there any performance gain
from doing this vs doing it client side?
Or is it basically the same amount of work - a full scan checking & skipping timestamps..

Take care,

 From: Carson Hoffacker <choffacker@gmail.com>
To: user@hbase.apache.org; Stuart Smith <stu24mail@yahoo.com> 
Sent: Wednesday, December 14, 2011 10:29 AM
Subject: Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?
The timerange scan is able to leverage metadata in each of the HFiles. Each
HFile should store information about the timerange associated with the data
within the HFile. If the the timerange associated with the HFile is
different than the timerange you are interested in, that hfile will be
skipped completely. This can significantly increase scan performance.

However, when these files get compacted and the data is merged into a
smaller number of files, the time range associated with each file
increases. I don't think it works this way out of the box, but I believe
you can be smart about how you manage compactions over time to get the
behavior that you want. You could have compactions compact all the data
from January 2011 into a single file, and then compact all the data from
February 2011 into a different file.


On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith <stu24mail@yahoo.com> wrote:

> Hello Thomas,
>    Someone here could probably provide more help, but to start you off,
> the only way I've filtered timestamps is to do a scan, and just filter out
> rows one by one. This definitely sounds like something coprocessors could
> help with, but I don't really understand those yet, so someone else will
> have to step up.. or you can really dig into the documentation about them
> (AFAIK, it's a little bit of custom code that runs on the regionservers
> that can pre-process your gets.. but don't quote me on that!).
> But I can say that a major compaction should not affect them - I've never
> seen it happen, and if it does, I believe that's a bug.
> Take care,
>   -stu
> ________________________________
>  From: Steinmaurer Thomas <Thomas.Steinmaurer@scch.at>
> To: user@hbase.apache.org
> Sent: Wednesday, December 14, 2011 12:38 AM
> Subject: Questions on timestamps, insights on how timerange/timestamp
> filter are processed?
> Hello,
> can anybody share some insights on how timerange/timestamp filters are
> processed?
> Basically we intend to use timerange/timestamp filters to process rather
> new data from an insertion timestamp POV
> - How does the process of skipping records and/or regions work, if one
> use timerange filters?
> - I also wonder, do timestamp change when e.g. running a major
> compaction?
> - If data grows over the years, is there any chance that regions with
> "older" rows keep "stable" in a way, that they can be skipped very
> quickly when querying data with a timerange filter of e.g. the last
> three yours?
> Thanks,
> Thomas
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message