hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12313) Redo the hfile index length optimization so cell-based rather than serialized KV key
Date Sun, 26 Oct 2014 16:54:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184538#comment-14184538
] 

stack commented on HBASE-12313:
-------------------------------

bq. Do you want to change really Stack?

This patch cleans up the CellUtil methods that do size counting.  There were a few too many
methods each only slightly different from each other.  In this particular case, we are just
doing an estimate and serialized size is probably closest to what we are putting on wire at
this stage.  I don't see a problem that it is slightly different from what was there before
(what was there before was an 'estimate').  Do you?

bq. Replacing estimatedLengthOf with estimatedSerializedSizeOf is correct?

Where we were using estimatedLengthOf (What is this anyways -- smile? Serialized 'length'
or size on heap?  Or size of the serialized KeyValue byte array -- which is going away), we
were talking serialized size.  I was thinking estimatedSerializedSizeOf more appropriate where
I did the replaces.

bq. No need to add the extra 4 bytes for heapSize which will come in estimatedSerializedSizeOf

Are your referring to the TODO? I'd think that serialized size and heap size will be calculated
differently when we get around to it.

bq. Can we add a separator in between rk, f and q parts?

Whoops.  Will fix.

bq. What if we do seekTo 'h' only ?

There is no 'h' in the dataset.  It was 'artificial' midpoint.  If you seek to 'h', you end
up in the second block which starts with 'i'.

bq. Will this change in mid point calc make any issue in reads?

I don't believe so.  This whole area was without tests previously.  I made the mid calc code
stand apart and added a bunch in this patch.  I also as part of making this patch put in place
the old code and the new and when the result did not equate, I threw exception as our unit
test suite ran.  I looked at each case to see if the difference was legit?  What I found was
that the differences were because we made midkeys even when no advantage (as in the above
'h' case -- no need to make a midkey if all sizes are the same).



> Redo the hfile index length optimization so cell-based rather than serialized KV key
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-12313
>                 URL: https://issues.apache.org/jira/browse/HBASE-12313
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: stack
>            Assignee: stack
>         Attachments: 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch, 0001-HBASE-12313-Redo-the-hfile-index-length-optimization.patch,
12313v5.txt
>
>
> Trying to remove API that returns the 'key' of a KV serialized into a byte array is thorny.
> I tried to move over the first and last key serializations and the hfile index entries
to be cell but patch was turning massive.  Here is a smaller patch that just redoes the optimization
that tries to find 'short' midpoints between last key of last block and first key of next
block so it is Cell-based rather than byte array based (presuming Keys serialized in a certain
way).  Adds unit tests which we didn't have before.
> Also remove CellKey.  Not needed... at least not yet.  Its just utility for toString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message