hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NNever <nnever...@gmail.com>
Subject Re: Scan triggered per page-request, performance-impacts?
Date Tue, 05 Jun 2012 02:33:52 GMT
Does the Schema like this:

T2{
  rowkey: rs-time row
   {
       family:qualifier =  t1's row
   }
}

Then you Scan the newest 1000 from T2, and each get it's t1Row, then do
1000 Gets from T1 for one page?

2012/6/5 NNever <nneverwei@gmail.com>

> '- I'd like to do the top N stuff on the server side to reduce traffic,
> will this be possible? '
>
> Endpoint?
>
>
> 2012/6/5 Em <mailformailinglists@yahoo.de>
>
>> Hello list,
>>
>> let's say I have to fetch a lot of rows for a page-request (say
>> 1.000-2.000).
>> The row-keys are a composition of a fixed id of an object and a
>> sequential ever-increasing id. Salting those keys for balancing may be
>> taken into consideration.
>>
>> I want to do a Join like this one expressed in SQL:
>>
>> SELECT t1.columns FROM t1
>> JOIN t2 ON (t1.id = t2.id)
>> WHERE t2.id = fixedID-prefix
>>
>> I know that HBase does not support that out of the box.
>> My approach is to have all the fixed-ids as columns of a row in t1.
>> Selecting a row, I fetch those columns that are of interest for me,
>> where each column contains a fixedID for t2.
>> Now I do a scan on t2 for each fixedID which should return me exactly
>> one value per fixedID (it's kind of a reverse-timestamp-approach like in
>> the HBase-book).
>> Furthermore I am really only interested in the key itself. I don't care
>> about the columns (t2 is more like an index).
>> Having fetched a row per fixedID, I sort based on the sequential part of
>> their key and get the top N.
>> For those top N I'll fetch data from t1.
>>
>> The usecase is to fetch the top N most recent entitys of t1 that are
>> associated with a specific entity in t1 by using t2 as an index.
>> T2 has one extra benefit over t1: You can do range-scans, if neccessary.
>>
>> Questions:
>> - since this is triggered by a page-request: Will this return with low
>> latency?
>> - is there a possibility to do those Scans in a batch? Maybe I can
>> combine them into one big scanner, using a custom filter for what I want?
>> - do you have thoughts on improving this type of request?
>> - I'd like to do the top N stuff on the server side to reduce traffic,
>> will this be possible?
>> - I am not sure whether a Scan is really what I want. Maybe a Multiget
>> will fit my needs better combined with a RowFilter?
>>
>>
>> I really work hard on finding the best approach of mapping this
>> m:n-relation to a HBase schema - so any help is appreciated.
>>
>> Please note: I haven't written any line of HBase code so far. Currently
>> I am studying books, blog-posts, slides and the mailinglists for
>> learning more about HBase.
>>
>> Thanks!
>>
>> Kind regards,
>> Em
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message