hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: querying data on the basis of timestamp
Date Thu, 14 Mar 2013 23:03:51 GMT
What you are asking looks similar to this:
HBASE-5010 Filter HFiles based on TTL

It went into 0.94.0


On Thu, Mar 14, 2013 at 3:53 PM, Pankaj Gupta <pankaj.roark@gmail.com>wrote:

> Hi,
> I have a question regarding query performance for rows greater than a
> timestamp. The use case is this:
> I want to find all the rows in a key range that have changed after a
> certain timestamp and upto a certain timestamp, i.e. exactly using this
> SCAN api:
> Scan    setTimeRange(long minStamp, long maxStamp)
>           Get versions of columns only within the specified timestamp
> range, [minStamp, maxStamp)
> Would this query go through all the rows in the key range or is there an
> optimization that makes it faster.
> I ask because I read about such an optimization in the following paper:
> http://oss.csie.fju.edu.tw/~tzu98/Apache%20Hadoop%20Goes%20Realtime%20at%20Facebook.pdf
> Here is the excerpt:
> "For data stored in HBase that is time-series or contains a specific,
> known timestamp, a special timestamp file selection algorithm
> was added. Since time moves forward and data is rarely inserted
> at a significantly later time than its timestamp, each HFile will
> generally contain values for a fixed range of time. This
> information is stored as metadata in each HFile and queries that
> ask for a specific timestamp or range of timestamps will check if
> the request intersects with the ranges of each file, skipping those
> which do not overlap. "
> This will work perfectly for my use case but I don't know if this
> optimization, or any other for this use case, exists in the Apache HBase.
> The version of Apache HBASE we are currently using is 0.92.1 but
> considering moving to 0.94.
> Thanks,
> Pankaj

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message