hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From n keywal <nkey...@gmail.com>
Subject Re: Performance of scan setTimeRange VS manually doing it
Date Wed, 12 Sep 2012 22:08:40 GMT
For each file; there is a time range. When you scan/search, the file is
skipped if there is no overlap between the file timerange and the timerange
of the query. As there are other parameters as well (row distribution,
compaction effects, cache, bloom filters, ...) it's difficult to know in
advance what's going to happen exactly.  But specifying a timerange does no
harm for sure, if it matches your functional needs...

This said, if you already have the rowkey, the time range is less
interesting as you will skip a lot of file already.

On Wed, Sep 12, 2012 at 11:52 PM, Tom Brown <tombrown52@gmail.com> wrote:

> When I query HBase, I always include a time range. This has not been a
> problem when querying recent data, but it seems to be an issue when I
> query older data (a few hours old). All of my row keys include the
> timestamp as part of the key (this value is the same as the HBase
> timestamp for the row).  I recently tried an experiment where I
> manually re-seek to the possible row (based on the timestamp as part
> of the row key) instead of using "setTimeRange" on my scan object and
> was amazed to see that there was no degradation for older data.
> Can someone postulate a theory as to why this might be happening? I'm
> happy to provide extra data if it will help you theorize...
> Is there a downside to stopping using "setTimeRange"?
> --Tom

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message