hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Beware of PREFIX_TREE block encoding
Date Sun, 20 Oct 2013 04:08:27 GMT
*Now, which encoder did you test specifically? I seen a 20-40% slowdown
when everything is in the blockcache (which is the worst case scenario
here), certainly not a 10x slowdown.*

I have 1.3M rows (very small - 48 bytes) in a block cache which I read
sequentially, using encoding NONE, PREFIX_TREE and
StoreScanner/StoreFileScanner (close to metal - block cache :)

Time to read all 1.3M rows reported in ms.

encoding  = NONE,                scanner = StoreScanner;      time = 300  ms
encoding  = PREFIX_TREE,  scanner = StoreScanner;      time = 860  ms
encoding  = NONE              ,  scanner = StoreFileScanner; time = 52   ms
encoding  = PREFIX_TREE,  scanner = StoreFileScanner; time = 545 ms

-Vladimir



On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <larsh@apache.org> wrote:

> That is (unfortunately) a known issue. The main problem is that HBase
> expects each KV to be backed by a contiguous byte[]. For any prefix
> encoding it is thus necessary to rematerialize the KV (i.e. copy all the
> partial bytes into a new location).
> That is inefficient. Nobody has taken on to fix this (we're 1/2 there with
> Cells in 0.96, though).
>
> There a jiras out there to fix this like HBASE-7320 and more recently
> HBASE-9794.
>
> Now, which encoder did you test specifically? I seen a 20-40% slowdown
> when everything is in the blockcache (which is the worst case scenario
> here), certainly not a 10x slowdown.
>
> Note that with block encoding the block are stored encoded in the
> blockcache, so more data fits into the cache, and (obviously) there's less
> IO when the data is not in the cache). So the extra work CPU cycles and
> memory bandwidth used are offset by that.
>
> There're other problems too. I just filed an issue (HBASE-9807) where with
> block encoders we make a copy of the key portion of the KV on each reseek,
> just to compare it the current scan key.
>
> -- Lars
> ________________________________
> From: Vladimir Rodionov <vrodionov@carrieriq.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> Sent: Saturday, October 19, 2013 7:34 PM
> Subject: RE: Beware of PREFIX_TREE block encoding
>
>
> What I wanted to say by this? HBase still does not have block encoding
> which is optimal for both scan and seek (re-seek).
> I do not think these goals are mutually exclusive.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
>
> From: Vladimir Rodionov [vladrodionov@gmail.com]
> Sent: Saturday, October 19, 2013 7:32 PM
> To: dev@hbase.apache.org
> Subject: Beware of PREFIX_TREE block encoding
>
> The scan performance is bad. 10 x slower on my tests than for blocks with
> NONE encoding. I scan data directly from block cache through
> StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It should
> be clearly stated  that this encoding degrades overall performance
> significantly in favor of data size reduction and is suitable only for Gets
> - not for Scans.
>
> Best regards,
> -Vladimir Rodionov
>
> -
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message