hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Beware of PREFIX_TREE block encoding
Date Sun, 20 Oct 2013 17:06:02 GMT
FAST_DIFF:
Time to read all 1.3M rows reported in ms.

encoding  = NONE,                scanner = StoreScanner;      time = 300  ms
encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
encoding  = FAST_DIFF,        scanner = StoreScanner;      time = 460  ms
encoding  = NONE              ,  scanner = StoreFileScanner; time = 52   ms
encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms
encoding  = FAST_DIFF,        scanner = StoreFileScanner; time = 195  ms

-Vladimir



On Sun, Oct 20, 2013 at 4:06 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Vladimir, any chance to run the same test with FAST_DIFF?
>
> J
>
>
> 2013/10/20 Vladimir Rodionov <vladrodionov@gmail.com>
>
> > I wanted to try PREFIX_TREE because it is supposed to be fastest on
> > seek/reseek.
> >
> >
> > On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <larsh@apache.org> wrote:
> >
> > > I found FAST_DIFF to be the fastest of the block encoders.
> > > (Prefix tree is in 0.96+ only as far as I know.)
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Vladimir Rodionov <vladrodionov@gmail.com>
> > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <
> > > larsh@apache.org>
> > > Cc:
> > > Sent: Saturday, October 19, 2013 9:08 PM
> > > Subject: Re: Beware of PREFIX_TREE block encoding
> > >
> > > *Now, which encoder did you test specifically? I seen a 20-40% slowdown
> > > when everything is in the blockcache (which is the worst case scenario
> > > here), certainly not a 10x slowdown.*
> > >
> > > I have 1.3M rows (very small - 48 bytes) in a block cache which I read
> > > sequentially, using encoding NONE, PREFIX_TREE and
> > > StoreScanner/StoreFileScanner (close to metal - block cache :)
> > >
> > > Time to read all 1.3M rows reported in ms.
> > >
> > > encoding  = NONE,                scanner = StoreScanner;      time =
> 300
> > > ms
> > > encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
> > > encoding  = NONE              ,  scanner = StoreFileScanner; time = 52
> > ms
> > > encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms
> > >
> > > -Vladimir
> > >
> > >
> > >
> > >
> > > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <larsh@apache.org>
> wrote:
> > >
> > > > That is (unfortunately) a known issue. The main problem is that HBase
> > > > expects each KV to be backed by a contiguous byte[]. For any prefix
> > > > encoding it is thus necessary to rematerialize the KV (i.e. copy all
> > the
> > > > partial bytes into a new location).
> > > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there
> > > with
> > > > Cells in 0.96, though).
> > > >
> > > > There a jiras out there to fix this like HBASE-7320 and more recently
> > > > HBASE-9794.
> > > >
> > > > Now, which encoder did you test specifically? I seen a 20-40%
> slowdown
> > > > when everything is in the blockcache (which is the worst case
> scenario
> > > > here), certainly not a 10x slowdown.
> > > >
> > > > Note that with block encoding the block are stored encoded in the
> > > > blockcache, so more data fits into the cache, and (obviously) there's
> > > less
> > > > IO when the data is not in the cache). So the extra work CPU cycles
> and
> > > > memory bandwidth used are offset by that.
> > > >
> > > > There're other problems too. I just filed an issue (HBASE-9807) where
> > > with
> > > > block encoders we make a copy of the key portion of the KV on each
> > > reseek,
> > > > just to compare it the current scan key.
> > > >
> > > > -- Lars
> > > > ________________________________
> > > > From: Vladimir Rodionov <vrodionov@carrieriq.com>
> > > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> > > > Sent: Saturday, October 19, 2013 7:34 PM
> > > > Subject: RE: Beware of PREFIX_TREE block encoding
> > > >
> > > >
> > > > What I wanted to say by this? HBase still does not have block
> encoding
> > > > which is optimal for both scan and seek (re-seek).
> > > > I do not think these goals are mutually exclusive.
> > > >
> > > >
> > > > Best regards,
> > > > Vladimir Rodionov
> > > > Principal Platform Engineer
> > > > Carrier IQ, www.carrieriq.com
> > > > e-mail: vrodionov@carrieriq.com
> > > >
> > > > ________________________________________
> > > >
> > > > From: Vladimir Rodionov [vladrodionov@gmail.com]
> > > > Sent: Saturday, October 19, 2013 7:32 PM
> > > > To: dev@hbase.apache.org
> > > > Subject: Beware of PREFIX_TREE block encoding
> > > >
> > > > The scan performance is bad. 10 x slower on my tests than for blocks
> > with
> > > > NONE encoding. I scan data directly from block cache through
> > > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It
> > > should
> > > > be clearly stated  that this encoding degrades overall performance
> > > > significantly in favor of data size reduction and is suitable only
> for
> > > Gets
> > > > - not for Scans.
> > > >
> > > > Best regards,
> > > > -Vladimir Rodionov
> > > >
> > > > -
> > > >
> > > > Confidentiality Notice:  The information contained in this message,
> > > > including any attachments hereto, may be confidential and is intended
> > to
> > > be
> > > > read only by the individual or entity to whom this message is
> > addressed.
> > > If
> > > > the reader of this message is not the intended recipient or an agent
> or
> > > > designee of the intended recipient, please note that any review, use,
> > > > disclosure or distribution of this message or its attachments, in any
> > > form,
> > > > is strictly prohibited.  If you have received this message in error,
> > > please
> > > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > > delete or destroy any copy of this message and its attachments.
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message