hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: HBase - Secondary Index
Date Sun, 06 Jan 2013 22:12:21 GMT
@Mohit: Here is the jira for prefix compression discussed here:
https://issues.apache.org/jira/browse/HBASE-4676

HTH,
Anil Gupta

On Sun, Jan 6, 2013 at 12:40 PM, Adrien Mogenet <adrien.mogenet@gmail.com>wrote:

> Are your talking about Data block encoding of K/V ?
> https://issues.apache.org/jira/browse/HBASE-4218
>
>
> On Sun, Jan 6, 2013 at 9:36 PM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
>
> > Does anyone has any links or information to the new prefix encoding
> feature
> > in HBase that's being referred to in this mail?
> >
> > On Sun, Jan 6, 2013 at 12:30 PM, Adrien Mogenet <
> adrien.mogenet@gmail.com
> > >wrote:
> >
> > > Nice topic, perhaps one of the most important for 2013 :-)
> > > I still don't get how you're ensuring consistency between index table
> and
> > > main table, without an external component (such as
> bookkeeper/zookeeper).
> > > What's the exact write path in your situation when inserting data ?
> > > (WAL/RegionObserver, pre/post put/WALedit...)
> > >
> > > The underlying question is about how you're ensuring that WALEdit in
> > Index
> > > and Main tables are perfectly sync'ed, and how you 're able to rollback
> > in
> > > case of issue in both WAL ?
> > >
> > >
> > > On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min <kelvin.msj@gmail.com>
> > > wrote:
> > >
> > > > >Yes as you say when the no of rows to be returned is becoming more
> and
> > > > more the latency will be becoming more.  seeks within an HFile block
> is
> > > > some what expensive op now. (Not much but still)  The new encoding
> > > >prefix
> > > > trie will be a huge bonus here. There the seeks will be flying.. [Ted
> > > also
> > > > presented this in the Hadoop China]  Thanks to Matt... :)  I am
> trying
> > to
> > > > measure the scan performance with this new encoding . Trying to >back
> > > port
> > > > a simple patch for 94 version just for testing...   Yes when the no
> of
> > > > results to be returned is more and more any index will become less
> > > > performing as per my study  :)
> > > >
> > > > yes, you are right, I guess it's just a drawback of any index
> approach.
> > > > Thanks for the explanation.
> > > >
> > > > Shengjie
> > > >
> > > > On 28 December 2012 04:14, Anoop Sam John <anoopsj@huawei.com>
> wrote:
> > > >
> > > > > > Do you have link to that presentation?
> > > > >
> > > > > http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf
> > > > >
> > > > > -Anoop-
> > > > >
> > > > > ________________________________________
> > > > > From: Mohit Anchlia [mohitanchlia@gmail.com]
> > > > > Sent: Friday, December 28, 2012 9:12 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: HBase - Secondary Index
> > > > >
> > > > > On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John <
> anoopsj@huawei.com>
> > > > > wrote:
> > > > >
> > > > > > Yes as you say when the no of rows to be returned is becoming
> more
> > > and
> > > > > > more the latency will be becoming more.  seeks within an HFile
> > block
> > > is
> > > > > > some what expensive op now. (Not much but still)  The new
> encoding
> > > > prefix
> > > > > > trie will be a huge bonus here. There the seeks will be flying..
> > [Ted
> > > > > also
> > > > > > presented this in the Hadoop China]  Thanks to Matt... :)  I
am
> > > trying
> > > > to
> > > > > > measure the scan performance with this new encoding . Trying
to
> > back
> > > > > port a
> > > > > > simple patch for 94 version just for testing...   Yes when the
no
> > of
> > > > > > results to be returned is more and more any index will become
> less
> > > > > > performing as per my study  :)
> > > > > >
> > > > > > Do you have link to that presentation?
> > > > >
> > > > >
> > > > > > >btw, quick question- in your presentation, the scale there
is
> > > seconds
> > > > or
> > > > > > mill-seconds:)
> > > > > >
> > > > > > It is seconds.  Dont consider the exact values. What is the
% of
> > > > increase
> > > > > > in latency is important :) Those were not high end machines.
> > > > > >
> > > > > > -Anoop-
> > > > > > ________________________________________
> > > > > > From: Shengjie Min [kelvin.msj@gmail.com]
> > > > > > Sent: Thursday, December 27, 2012 9:59 PM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Re: HBase - Secondary Index
> > > > > >
> > > > > >  >Didnt follow u completely here. There wont be any get()
> > happening..
> > > > As
> > > > > > the
> > > > > > >exact rowkey in a region we get from the index table, we
can
> seek
> > to
> > > > the
> > > > > > >exact position and return that row.
> > > > > >
> > > > > > Sorry, When I misused "get()" here, I meant seeking. Yes, if
it's
> > > just
> > > > > > small number of rows returned, this works perfect. As you said
> you
> > > will
> > > > > get
> > > > > > the exact rowkey positions per region, and simply seek them.
I
> was
> > > > trying
> > > > > > to work out the case that when the number of result rows
> increases
> > > > > > massively. Like in Anil's case, he wants to do a scan query
> against
> > > the
> > > > > > 2ndary index(timestamp): "select all rows from timestamp1 to
> > > > timestamp2"
> > > > > > given no customerId provided. During that time period, he might
> > have
> > > a
> > > > > big
> > > > > > chunk of rows from different customerIds. The index table
> returns a
> > > lot
> > > > > of
> > > > > > rowkey positions for different customerIds (I believe they are
> > > > scattered
> > > > > in
> > > > > > different regions), then you end up seeking all different
> positions
> > > in
> > > > > > different regions and return all the rows needed. According
to
> your
> > > > > > presentation page14 - Performance Test Results (Scan), without
> > index,
> > > > > it's
> > > > > > a linear increase as result rows # increases. on the other hand,
> > with
> > > > > > index, time spent climbs up way quicker than the case without
> > index.
> > > > > >
> > > > > > btw, quick question- in your presentation, the scale there is
> > seconds
> > > > or
> > > > > > mill-seconds:)
> > > > > >
> > > > > > - Shengjie
> > > > > >
> > > > > >
> > > > > > On 27 December 2012 15:54, Anoop John <anoop.hbase@gmail.com>
> > wrote:
> > > > > >
> > > > > > > >how the massive number of get() is going to
> > > > > > > perform againt the main table
> > > > > > >
> > > > > > > Didnt follow u completely here. There wont be any get()
> > happening..
> > > > As
> > > > > > the
> > > > > > > exact rowkey in a region we get from the index table, we
can
> seek
> > > to
> > > > > the
> > > > > > > exact position and return that row.
> > > > > > >
> > > > > > > -Anoop-
> > > > > > >
> > > > > > > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <
> > > kelvin.msj@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > how the massive number of get() is going to
> > > > > > > > perform againt the main table
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > All the best,
> > > > > > Shengjie Min
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > All the best,
> > > > Shengjie Min
> > > >
> > >
> > >
> > >
> > > --
> > > Adrien Mogenet
> > > 06.59.16.64.22
> > > http://www.mogenet.me
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message