hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Yet another "get the last 100 rows" question...
Date Fri, 06 Jan 2012 16:21:14 GMT

re:  "Filters are not run on the Client"

I think I need to fix the section heading in the Architecture chapter in
the RefGuide.  I called it "Client Filters" because I was thinking
"filters that are used by client requests."  The detailed description
explains that these are applied in the RegionServer, but the description
is not clear as it could be (I.e., "Client Filters" makes it seems that
the filter is applied on the client).  I'll adjust.




On 1/5/12 5:38 PM, "Peter Wolf" <opus111@gmail.com> wrote:

>Ah ha!  Thank you for the prompt and useful response :-)
>
>The reverse timestamp key does the trick.  Thank you!
>
>So, Filters are not run on the Client.  For example, a
>SingleColumnValueFilter does its comparisons on the Server and is
>reasonably efficient.  Is this correct?
>
>P
>
>
>On 1/5/12 5:23 PM, Leonardo Gamas wrote:
>> 1) Filters are applied direct in the RegionServer:
>> 
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.htm
>>l
>>
>> 2) You can reverse the timestamp:
>> http://hbase.apache.org/book/rowkey.design.html#reverse.timestamp
>>
>> So you will have the rowkey:<accountID><reverse timestamp>
>> In the scan you set the caching attribute to 100, so the matches will be
>> transfered to the client in a single trip, but you need to count the
>>number
>> of times you call the next in the scanner to don't exceed the cache and
>> cause a new call to the regionserver.
>>
>> 2012/1/5 Peter Wolf<opus111@gmail.com>
>>
>>> Hello all,
>>>
>>> I am a new HBase user with a familiar problem.  I need to efficiently
>>> return the last 100 rows from an account.  I searched the archives, and
>>> read the book, but did not find a complete answer.
>>>
>>> I have a table of interactions with my users.  One row per interaction.
>>>
>>> I am using a composite Row Key of the form
>>>
>>> <accountID><timestamp>
>>>
>>> So using partial row key scans I can efficiently get all the rows for
>>>an
>>> account.
>>>
>>> Unfortunately, I do not know how to relate row count to timestamp, so I
>>> have to get all the rows.  I then use a PageFilter to get only the
>>>last 100.
>>>
>>> However, I believe that Filters operate on the Client side, so all of
>>>the
>>> rows get transmitted.  I believe this is not efficient.
>>>
>>> I have two questions--
>>>
>>> 1) Am I correct that my solution is not efficient, and I need to
>>>filter at
>>> the Server?
>>> 2) If so, is there a "best practice" for this problem?
>>>
>>> Thanks in advance
>>> Peter
>>>
>>
>>
>
>



Mime
View raw message