hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonardo Gamas <leoga...@jusbrasil.com.br>
Subject Re: Yet another "get the last 100 rows" question...
Date Thu, 05 Jan 2012 22:23:12 GMT
1) Filters are applied direct in the RegionServer:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html

2) You can reverse the timestamp:
http://hbase.apache.org/book/rowkey.design.html#reverse.timestamp

So you will have the rowkey: <accountID><reverse timestamp>
In the scan you set the caching attribute to 100, so the matches will be
transfered to the client in a single trip, but you need to count the number
of times you call the next in the scanner to don't exceed the cache and
cause a new call to the regionserver.

2012/1/5 Peter Wolf <opus111@gmail.com>

> Hello all,
>
> I am a new HBase user with a familiar problem.  I need to efficiently
> return the last 100 rows from an account.  I searched the archives, and
> read the book, but did not find a complete answer.
>
> I have a table of interactions with my users.  One row per interaction.
>
> I am using a composite Row Key of the form
>
> <accountID><timestamp>
>
> So using partial row key scans I can efficiently get all the rows for an
> account.
>
> Unfortunately, I do not know how to relate row count to timestamp, so I
> have to get all the rows.  I then use a PageFilter to get only the last 100.
>
> However, I believe that Filters operate on the Client side, so all of the
> rows get transmitted.  I believe this is not efficient.
>
> I have two questions--
>
> 1) Am I correct that my solution is not efficient, and I need to filter at
> the Server?
> 2) If so, is there a "best practice" for this problem?
>
> Thanks in advance
> Peter
>



-- 

*Leonardo Gamas*
Software Engineer
T +55 (71) 3494-3514
C +55 (75) 8134-7440
leogamas@jusbrasil.com.br
www.jusbrasil.com.br

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message