hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Beware of PREFIX_TREE block encoding
Date Sun, 20 Oct 2013 03:50:39 GMT
That is (unfortunately) a known issue. The main problem is that HBase expects each KV to be
backed by a contiguous byte[]. For any prefix encoding it is thus necessary to rematerialize
the KV (i.e. copy all the partial bytes into a new location).
That is inefficient. Nobody has taken on to fix this (we're 1/2 there with Cells in 0.96,
though).

There a jiras out there to fix this like HBASE-7320 and more recently HBASE-9794.

Now, which encoder did you test specifically? I seen a 20-40% slowdown when everything is
in the blockcache (which is the worst case scenario here), certainly not a 10x slowdown.

Note that with block encoding the block are stored encoded in the blockcache, so more data
fits into the cache, and (obviously) there's less IO when the data is not in the cache). So
the extra work CPU cycles and memory bandwidth used are offset by that.

There're other problems too. I just filed an issue (HBASE-9807) where with block encoders
we make a copy of the key portion of the KV on each reseek, just to compare it the current
scan key.

-- Lars
________________________________
From: Vladimir Rodionov <vrodionov@carrieriq.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org> 
Sent: Saturday, October 19, 2013 7:34 PM
Subject: RE: Beware of PREFIX_TREE block encoding


What I wanted to say by this? HBase still does not have block encoding which is optimal for
both scan and seek (re-seek).
I do not think these goals are mutually exclusive.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________

From: Vladimir Rodionov [vladrodionov@gmail.com]
Sent: Saturday, October 19, 2013 7:32 PM
To: dev@hbase.apache.org
Subject: Beware of PREFIX_TREE block encoding

The scan performance is bad. 10 x slower on my tests than for blocks with
NONE encoding. I scan data directly from block cache through
StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It should
be clearly stated  that this encoding degrades overall performance
significantly in favor of data size reduction and is suitable only for Gets
- not for Scans.

Best regards,
-Vladimir Rodionov

-

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message