hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carson Hoffacker <choffac...@gmail.com>
Subject Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?
Date Thu, 15 Dec 2011 04:36:04 GMT
I believe it's the same amount of work.

On Wed, Dec 14, 2011 at 3:37 PM, Stuart Smith <stu24mail@yahoo.com> wrote:

> Ah. Thanks for clarifying my wrong answer.. !
>
> The only time I had to deal with timestamps I had to go through the thrift
> API ...
> Never noticed the setTimeRange in the Scan() java API :)
>
> So now I'm curious.. If I use this and it can't skip HFiles.. is there any
> performance gain from doing this vs doing it client side?
> Or is it basically the same amount of work - a full scan checking &
> skipping timestamps.. ?
>
>
> Take care,
>   -stu
>
>
>
> ________________________________
>  From: Carson Hoffacker <choffacker@gmail.com>
> To: user@hbase.apache.org; Stuart Smith <stu24mail@yahoo.com>
> Sent: Wednesday, December 14, 2011 10:29 AM
> Subject: Re: Questions on timestamps, insights on how timerange/timestamp
> filter are processed?
>
> The timerange scan is able to leverage metadata in each of the HFiles. Each
> HFile should store information about the timerange associated with the data
> within the HFile. If the the timerange associated with the HFile is
> different than the timerange you are interested in, that hfile will be
> skipped completely. This can significantly increase scan performance.
>
> However, when these files get compacted and the data is merged into a
> smaller number of files, the time range associated with each file
> increases. I don't think it works this way out of the box, but I believe
> you can be smart about how you manage compactions over time to get the
> behavior that you want. You could have compactions compact all the data
> from January 2011 into a single file, and then compact all the data from
> February 2011 into a different file.
>
> -Carson
>
> On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith <stu24mail@yahoo.com> wrote:
>
> > Hello Thomas,
> >
> >    Someone here could probably provide more help, but to start you off,
> > the only way I've filtered timestamps is to do a scan, and just filter
> out
> > rows one by one. This definitely sounds like something coprocessors could
> > help with, but I don't really understand those yet, so someone else will
> > have to step up.. or you can really dig into the documentation about them
> > (AFAIK, it's a little bit of custom code that runs on the regionservers
> > that can pre-process your gets.. but don't quote me on that!).
> >
> > But I can say that a major compaction should not affect them - I've never
> > seen it happen, and if it does, I believe that's a bug.
> >
> > Take care,
> >   -stu
> >
> >
> >
> > ________________________________
> >  From: Steinmaurer Thomas <Thomas.Steinmaurer@scch.at>
> > To: user@hbase.apache.org
> > Sent: Wednesday, December 14, 2011 12:38 AM
> > Subject: Questions on timestamps, insights on how timerange/timestamp
> > filter are processed?
> >
> > Hello,
> >
> > can anybody share some insights on how timerange/timestamp filters are
> > processed?
> >
> > Basically we intend to use timerange/timestamp filters to process rather
> > new data from an insertion timestamp POV
> >
> > - How does the process of skipping records and/or regions work, if one
> > use timerange filters?
> > - I also wonder, do timestamp change when e.g. running a major
> > compaction?
> > - If data grows over the years, is there any chance that regions with
> > "older" rows keep "stable" in a way, that they can be skipped very
> > quickly when querying data with a timerange filter of e.g. the last
> > three yours?
> >
> > Thanks,
> > Thomas
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message