hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Re: Scanning the last N rows
Date Fri, 02 Mar 2012 21:31:19 GMT
Thanks Shaneal,

My rows are created by customer interaction.  Unfortunately, I am not 
interested in rows from a region of time (i.e. "now" .. "a month ago).  
Instead I want the last N interactions.

Let's say I incorporated an interaction count into the key, and I want 
to get most recent 1000 rows.  I can then do a simple scan with start 
and stop partial row keys.

But how do I get the interaction count value of the most recent row?


On 3/2/12 4:20 PM, Shaneal Manek wrote:
> Assuming your rowkey doesn't somehow encode the time that row was
> created (in which case you can simply do a scan), things get a bit
> more interesting.
> The 'easiest' approach is probably to Scan, but use a custom filter
> that only allows in 'recent' rows based on their timestamp (see the
> TimestampsFilter for an example of how to do this - it isn't exactly
> what you need, but should show you how) so that you expect at least N
> rows to match. Then, if your scan matched at least N row, you can sort
> and take the top N client side. If your scan retrieved less than N
> row, so you'll have go back and do another scan with a different
> timestamp filter and aggregate/sort the results from the multiple
> scans.
> The more efficient approach might be to create a second table as a
> 'recency' index. Let's pretend your data table is called 'd'. Then,
> you'd created a second table called 'dri' (data recency index). Every
> time you insert a row into 'd' with a rowkey of 'r', you also insert a
> row into 'dri' with a rowkey of the current timestamp, and only one
> column (say, called 'dr') with a value of 'r'. Then, when you want to
> retrieve the last N rows, you can look at the last N rows in the dri
> table, and GET the rows from the 'd' table with row keys matching the
> column values in 'dr'. You can automate some of this with coprocessors
> too.
> Of course, the easiest way is to simply make the most significant bits
> of your rowkey in your actual data be a timestamp, but I don't know if
> your schema would allow that.
> -Shaneal
> On Fri, Mar 2, 2012 at 1:02 PM, Peter Wolf<opus111@gmail.com>  wrote:
>> Hello all,
>> I want to retrieve the most recent N rows from a table, with some column
>> qualifiers.
>> I can't find a Filter, or anything obvious in my books, or via Google.
>> What is the idiom for doing this?
>> Thanks
>> Peter

View raw message