hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Scan triggered per page-request, performance-impacts?
Date Mon, 04 Jun 2012 20:55:20 GMT
Hello list,

let's say I have to fetch a lot of rows for a page-request (say
1.000-2.000).
The row-keys are a composition of a fixed id of an object and a
sequential ever-increasing id. Salting those keys for balancing may be
taken into consideration.

I want to do a Join like this one expressed in SQL:

SELECT t1.columns FROM t1
JOIN t2 ON (t1.id = t2.id)
WHERE t2.id = fixedID-prefix

I know that HBase does not support that out of the box.
My approach is to have all the fixed-ids as columns of a row in t1.
Selecting a row, I fetch those columns that are of interest for me,
where each column contains a fixedID for t2.
Now I do a scan on t2 for each fixedID which should return me exactly
one value per fixedID (it's kind of a reverse-timestamp-approach like in
the HBase-book).
Furthermore I am really only interested in the key itself. I don't care
about the columns (t2 is more like an index).
Having fetched a row per fixedID, I sort based on the sequential part of
their key and get the top N.
For those top N I'll fetch data from t1.

The usecase is to fetch the top N most recent entitys of t1 that are
associated with a specific entity in t1 by using t2 as an index.
T2 has one extra benefit over t1: You can do range-scans, if neccessary.

Questions:
- since this is triggered by a page-request: Will this return with low
latency?
- is there a possibility to do those Scans in a batch? Maybe I can
combine them into one big scanner, using a custom filter for what I want?
- do you have thoughts on improving this type of request?
- I'd like to do the top N stuff on the server side to reduce traffic,
will this be possible?
- I am not sure whether a Scan is really what I want. Maybe a Multiget
will fit my needs better combined with a RowFilter?


I really work hard on finding the best approach of mapping this
m:n-relation to a HBase schema - so any help is appreciated.

Please note: I haven't written any line of HBase code so far. Currently
I am studying books, blog-posts, slides and the mailinglists for
learning more about HBase.

Thanks!

Kind regards,
Em

Mime
View raw message