hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@gmail.com>
Subject Re: HBase - Secondary Index
Date Sun, 06 Jan 2013 20:40:42 GMT
Are your talking about Data block encoding of K/V ?
https://issues.apache.org/jira/browse/HBASE-4218


On Sun, Jan 6, 2013 at 9:36 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> Does anyone has any links or information to the new prefix encoding feature
> in HBase that's being referred to in this mail?
>
> On Sun, Jan 6, 2013 at 12:30 PM, Adrien Mogenet <adrien.mogenet@gmail.com
> >wrote:
>
> > Nice topic, perhaps one of the most important for 2013 :-)
> > I still don't get how you're ensuring consistency between index table and
> > main table, without an external component (such as bookkeeper/zookeeper).
> > What's the exact write path in your situation when inserting data ?
> > (WAL/RegionObserver, pre/post put/WALedit...)
> >
> > The underlying question is about how you're ensuring that WALEdit in
> Index
> > and Main tables are perfectly sync'ed, and how you 're able to rollback
> in
> > case of issue in both WAL ?
> >
> >
> > On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min <kelvin.msj@gmail.com>
> > wrote:
> >
> > > >Yes as you say when the no of rows to be returned is becoming more and
> > > more the latency will be becoming more.  seeks within an HFile block is
> > > some what expensive op now. (Not much but still)  The new encoding
> > >prefix
> > > trie will be a huge bonus here. There the seeks will be flying.. [Ted
> > also
> > > presented this in the Hadoop China]  Thanks to Matt... :)  I am trying
> to
> > > measure the scan performance with this new encoding . Trying to >back
> > port
> > > a simple patch for 94 version just for testing...   Yes when the no of
> > > results to be returned is more and more any index will become less
> > > performing as per my study  :)
> > >
> > > yes, you are right, I guess it's just a drawback of any index approach.
> > > Thanks for the explanation.
> > >
> > > Shengjie
> > >
> > > On 28 December 2012 04:14, Anoop Sam John <anoopsj@huawei.com> wrote:
> > >
> > > > > Do you have link to that presentation?
> > > >
> > > > http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf
> > > >
> > > > -Anoop-
> > > >
> > > > ________________________________________
> > > > From: Mohit Anchlia [mohitanchlia@gmail.com]
> > > > Sent: Friday, December 28, 2012 9:12 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: HBase - Secondary Index
> > > >
> > > > On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John <anoopsj@huawei.com>
> > > > wrote:
> > > >
> > > > > Yes as you say when the no of rows to be returned is becoming more
> > and
> > > > > more the latency will be becoming more.  seeks within an HFile
> block
> > is
> > > > > some what expensive op now. (Not much but still)  The new encoding
> > > prefix
> > > > > trie will be a huge bonus here. There the seeks will be flying..
> [Ted
> > > > also
> > > > > presented this in the Hadoop China]  Thanks to Matt... :)  I am
> > trying
> > > to
> > > > > measure the scan performance with this new encoding . Trying to
> back
> > > > port a
> > > > > simple patch for 94 version just for testing...   Yes when the no
> of
> > > > > results to be returned is more and more any index will become less
> > > > > performing as per my study  :)
> > > > >
> > > > > Do you have link to that presentation?
> > > >
> > > >
> > > > > >btw, quick question- in your presentation, the scale there is
> > seconds
> > > or
> > > > > mill-seconds:)
> > > > >
> > > > > It is seconds.  Dont consider the exact values. What is the % of
> > > increase
> > > > > in latency is important :) Those were not high end machines.
> > > > >
> > > > > -Anoop-
> > > > > ________________________________________
> > > > > From: Shengjie Min [kelvin.msj@gmail.com]
> > > > > Sent: Thursday, December 27, 2012 9:59 PM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: HBase - Secondary Index
> > > > >
> > > > >  >Didnt follow u completely here. There wont be any get()
> happening..
> > > As
> > > > > the
> > > > > >exact rowkey in a region we get from the index table, we can
seek
> to
> > > the
> > > > > >exact position and return that row.
> > > > >
> > > > > Sorry, When I misused "get()" here, I meant seeking. Yes, if it's
> > just
> > > > > small number of rows returned, this works perfect. As you said you
> > will
> > > > get
> > > > > the exact rowkey positions per region, and simply seek them. I was
> > > trying
> > > > > to work out the case that when the number of result rows increases
> > > > > massively. Like in Anil's case, he wants to do a scan query against
> > the
> > > > > 2ndary index(timestamp): "select all rows from timestamp1 to
> > > timestamp2"
> > > > > given no customerId provided. During that time period, he might
> have
> > a
> > > > big
> > > > > chunk of rows from different customerIds. The index table returns
a
> > lot
> > > > of
> > > > > rowkey positions for different customerIds (I believe they are
> > > scattered
> > > > in
> > > > > different regions), then you end up seeking all different positions
> > in
> > > > > different regions and return all the rows needed. According to your
> > > > > presentation page14 - Performance Test Results (Scan), without
> index,
> > > > it's
> > > > > a linear increase as result rows # increases. on the other hand,
> with
> > > > > index, time spent climbs up way quicker than the case without
> index.
> > > > >
> > > > > btw, quick question- in your presentation, the scale there is
> seconds
> > > or
> > > > > mill-seconds:)
> > > > >
> > > > > - Shengjie
> > > > >
> > > > >
> > > > > On 27 December 2012 15:54, Anoop John <anoop.hbase@gmail.com>
> wrote:
> > > > >
> > > > > > >how the massive number of get() is going to
> > > > > > perform againt the main table
> > > > > >
> > > > > > Didnt follow u completely here. There wont be any get()
> happening..
> > > As
> > > > > the
> > > > > > exact rowkey in a region we get from the index table, we can
seek
> > to
> > > > the
> > > > > > exact position and return that row.
> > > > > >
> > > > > > -Anoop-
> > > > > >
> > > > > > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <
> > kelvin.msj@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > how the massive number of get() is going to
> > > > > > > perform againt the main table
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > All the best,
> > > > > Shengjie Min
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > All the best,
> > > Shengjie Min
> > >
> >
> >
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>



-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message