hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Beware of PREFIX_TREE block encoding
Date Sun, 20 Oct 2013 11:06:17 GMT
Vladimir, any chance to run the same test with FAST_DIFF?

J


2013/10/20 Vladimir Rodionov <vladrodionov@gmail.com>

> I wanted to try PREFIX_TREE because it is supposed to be fastest on
> seek/reseek.
>
>
> On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <larsh@apache.org> wrote:
>
> > I found FAST_DIFF to be the fastest of the block encoders.
> > (Prefix tree is in 0.96+ only as far as I know.)
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Vladimir Rodionov <vladrodionov@gmail.com>
> > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> > larsh@apache.org>
> > Cc:
> > Sent: Saturday, October 19, 2013 9:08 PM
> > Subject: Re: Beware of PREFIX_TREE block encoding
> >
> > *Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > when everything is in the blockcache (which is the worst case scenario
> > here), certainly not a 10x slowdown.*
> >
> > I have 1.3M rows (very small - 48 bytes) in a block cache which I read
> > sequentially, using encoding NONE, PREFIX_TREE and
> > StoreScanner/StoreFileScanner (close to metal - block cache :)
> >
> > Time to read all 1.3M rows reported in ms.
> >
> > encoding  = NONE,                scanner = StoreScanner;      time = 300
> > ms
> > encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
> > encoding  = NONE              ,  scanner = StoreFileScanner; time = 52
> ms
> > encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms
> >
> > -Vladimir
> >
> >
> >
> >
> > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <larsh@apache.org> wrote:
> >
> > > That is (unfortunately) a known issue. The main problem is that HBase
> > > expects each KV to be backed by a contiguous byte[]. For any prefix
> > > encoding it is thus necessary to rematerialize the KV (i.e. copy all
> the
> > > partial bytes into a new location).
> > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there
> > with
> > > Cells in 0.96, though).
> > >
> > > There a jiras out there to fix this like HBASE-7320 and more recently
> > > HBASE-9794.
> > >
> > > Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > > when everything is in the blockcache (which is the worst case scenario
> > > here), certainly not a 10x slowdown.
> > >
> > > Note that with block encoding the block are stored encoded in the
> > > blockcache, so more data fits into the cache, and (obviously) there's
> > less
> > > IO when the data is not in the cache). So the extra work CPU cycles and
> > > memory bandwidth used are offset by that.
> > >
> > > There're other problems too. I just filed an issue (HBASE-9807) where
> > with
> > > block encoders we make a copy of the key portion of the KV on each
> > reseek,
> > > just to compare it the current scan key.
> > >
> > > -- Lars
> > > ________________________________
> > > From: Vladimir Rodionov <vrodionov@carrieriq.com>
> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > > Sent: Saturday, October 19, 2013 7:34 PM
> > > Subject: RE: Beware of PREFIX_TREE block encoding
> > >
> > >
> > > What I wanted to say by this? HBase still does not have block encoding
> > > which is optimal for both scan and seek (re-seek).
> > > I do not think these goals are mutually exclusive.
> > >
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > >
> > > From: Vladimir Rodionov [vladrodionov@gmail.com]
> > > Sent: Saturday, October 19, 2013 7:32 PM
> > > To: dev@hbase.apache.org
> > > Subject: Beware of PREFIX_TREE block encoding
> > >
> > > The scan performance is bad. 10 x slower on my tests than for blocks
> with
> > > NONE encoding. I scan data directly from block cache through
> > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It
> > should
> > > be clearly stated  that this encoding degrades overall performance
> > > significantly in favor of data size reduction and is suitable only for
> > Gets
> > > - not for Scans.
> > >
> > > Best regards,
> > > -Vladimir Rodionov
> > >
> > > -
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message