hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonardo Gamas <leoga...@jusbrasil.com.br>
Subject Re: Yet another "get the last 100 rows" question...
Date Thu, 05 Jan 2012 22:54:48 GMT
Yes the comparisons are made in the regionserver, if you pass a good
startrow it may work, i really don't made any performance tests with it.
You can always create a secondary index (normally another table with your
<column value> + <reference to a target rowkey> as the rowkey).

2012/1/5 Peter Wolf <opus111@gmail.com>

> Ah ha!  Thank you for the prompt and useful response :-)
>
> The reverse timestamp key does the trick.  Thank you!
>
> So, Filters are not run on the Client.  For example, a
> SingleColumnValueFilter does its comparisons on the Server and is
> reasonably efficient.  Is this correct?
>
> P
>
>
>
> On 1/5/12 5:23 PM, Leonardo Gamas wrote:
>
>> 1) Filters are applied direct in the RegionServer:
>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>> hbase/filter/Filter.html<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html>
>>
>> 2) You can reverse the timestamp:
>> http://hbase.apache.org/book/**rowkey.design.html#reverse.**timestamp<http://hbase.apache.org/book/rowkey.design.html#reverse.timestamp>
>>
>> So you will have the rowkey:<accountID><reverse timestamp>
>> In the scan you set the caching attribute to 100, so the matches will be
>> transfered to the client in a single trip, but you need to count the
>> number
>> of times you call the next in the scanner to don't exceed the cache and
>> cause a new call to the regionserver.
>>
>> 2012/1/5 Peter Wolf<opus111@gmail.com>
>>
>>  Hello all,
>>>
>>> I am a new HBase user with a familiar problem.  I need to efficiently
>>> return the last 100 rows from an account.  I searched the archives, and
>>> read the book, but did not find a complete answer.
>>>
>>> I have a table of interactions with my users.  One row per interaction.
>>>
>>> I am using a composite Row Key of the form
>>>
>>> <accountID><timestamp>
>>>
>>> So using partial row key scans I can efficiently get all the rows for an
>>> account.
>>>
>>> Unfortunately, I do not know how to relate row count to timestamp, so I
>>> have to get all the rows.  I then use a PageFilter to get only the last
>>> 100.
>>>
>>> However, I believe that Filters operate on the Client side, so all of the
>>> rows get transmitted.  I believe this is not efficient.
>>>
>>> I have two questions--
>>>
>>> 1) Am I correct that my solution is not efficient, and I need to filter
>>> at
>>> the Server?
>>> 2) If so, is there a "best practice" for this problem?
>>>
>>> Thanks in advance
>>> Peter
>>>
>>>
>>
>>
>


-- 

*Leonardo Gamas*
Software Engineer
T +55 (71) 3494-3514
C +55 (75) 8134-7440
leogamas@jusbrasil.com.br
www.jusbrasil.com.br

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message