hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NNever <nnever...@gmail.com>
Subject Re: Scan triggered per page-request, performance-impacts?
Date Tue, 05 Jun 2012 06:07:09 GMT
1. Endpoint is a kind of Coprocessor, it was added in 0.92. You can though
it a little like Relational-Database’s storedProcedure. It's some logicals
run on HBase server side. With it you may reduce your app's RPC calls, or
as you said,  reduce traffic .
you can get some help on Coprocessor/Endpoint from here:
https://blogs.apache.org/hbase/entry/coprocessor_introduction
2. I still a little confuse what exactly you want with this table struct
(Srry for that but my mother-language is not English).
You mean t1 is the original data of some ojects,
then t2 keep something about the object in t1?(like logs, 10:11 em check
t1obj1; 10:13 em buy t1obj1; 10:30 em tookaway t1obj1)?
3. You said 'This data is then sorted by the time part of the returned
rowkeys to get
the Top N of these.'. Well there may be no necessary to do the sort. HBase
keeps data in dictionary-order. Then you just fetch N of them, they are
already ordered.
4. I use HBase not long , infectly still a nood on it :) .  I would be glad
anything can help you.

Best Regards,
NN


2012/6/5 Em <mailformailinglists@yahoo.de>

> Hi,
>
> what do you mean by endpoint?
>
> It would look more like
>
> T2 {
>   rowkey: t1_id-(Long.MAX_VALUE - time)
>   {
>      family: qualifier = dummyDataSinceOnlyTheRowkeyMatters
>   }
> }
>
> For every t1_id associated with a specific object, one gets the newest
> entry in the T2-table (newest in relation to the key, not the internal
> timestamp of creation).
> This data is then sorted by the time part of the returned rowkeys to get
> the Top N of these.
> And then you get N records from t1 again.
>
> At last, that's what I thought about, though I am not sure that this is
> the most efficient way.
>
> Kind regards,
> Em
>
> Am 05.06.2012 04:33, schrieb NNever:
> > Does the Schema like this:
> >
> > T2{
> >   rowkey: rs-time row
> >    {
> >        family:qualifier =  t1's row
> >    }
> > }
> >
> > Then you Scan the newest 1000 from T2, and each get it's t1Row, then do
> > 1000 Gets from T1 for one page?
> >
> > 2012/6/5 NNever <nneverwei@gmail.com>
> >
> >> '- I'd like to do the top N stuff on the server side to reduce traffic,
> >> will this be possible? '
> >>
> >> Endpoint?
> >>
> >>
> >> 2012/6/5 Em <mailformailinglists@yahoo.de>
> >>
> >>> Hello list,
> >>>
> >>> let's say I have to fetch a lot of rows for a page-request (say
> >>> 1.000-2.000).
> >>> The row-keys are a composition of a fixed id of an object and a
> >>> sequential ever-increasing id. Salting those keys for balancing may be
> >>> taken into consideration.
> >>>
> >>> I want to do a Join like this one expressed in SQL:
> >>>
> >>> SELECT t1.columns FROM t1
> >>> JOIN t2 ON (t1.id = t2.id)
> >>> WHERE t2.id = fixedID-prefix
> >>>
> >>> I know that HBase does not support that out of the box.
> >>> My approach is to have all the fixed-ids as columns of a row in t1.
> >>> Selecting a row, I fetch those columns that are of interest for me,
> >>> where each column contains a fixedID for t2.
> >>> Now I do a scan on t2 for each fixedID which should return me exactly
> >>> one value per fixedID (it's kind of a reverse-timestamp-approach like
> in
> >>> the HBase-book).
> >>> Furthermore I am really only interested in the key itself. I don't care
> >>> about the columns (t2 is more like an index).
> >>> Having fetched a row per fixedID, I sort based on the sequential part
> of
> >>> their key and get the top N.
> >>> For those top N I'll fetch data from t1.
> >>>
> >>> The usecase is to fetch the top N most recent entitys of t1 that are
> >>> associated with a specific entity in t1 by using t2 as an index.
> >>> T2 has one extra benefit over t1: You can do range-scans, if
> neccessary.
> >>>
> >>> Questions:
> >>> - since this is triggered by a page-request: Will this return with low
> >>> latency?
> >>> - is there a possibility to do those Scans in a batch? Maybe I can
> >>> combine them into one big scanner, using a custom filter for what I
> want?
> >>> - do you have thoughts on improving this type of request?
> >>> - I'd like to do the top N stuff on the server side to reduce traffic,
> >>> will this be possible?
> >>> - I am not sure whether a Scan is really what I want. Maybe a Multiget
> >>> will fit my needs better combined with a RowFilter?
> >>>
> >>>
> >>> I really work hard on finding the best approach of mapping this
> >>> m:n-relation to a HBase schema - so any help is appreciated.
> >>>
> >>> Please note: I haven't written any line of HBase code so far. Currently
> >>> I am studying books, blog-posts, slides and the mailinglists for
> >>> learning more about HBase.
> >>>
> >>> Thanks!
> >>>
> >>> Kind regards,
> >>> Em
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message