hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksandr Shulman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
Date Thu, 13 Dec 2012 18:04:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531247#comment-13531247

Aleksandr Shulman commented on HBASE-7342:

Hi Ramkrishna,

The logic for the change is as follows:

With the existing implementation (using -1), when there are two items in the array, it returns
the 0th item ( (2 - 1) / 2 = 0 ) , which is equal the index of the firstKey. This is a problem
during splits because a split is invalid if the midkey is equal to the firstKey. What we really
want here is the index to be 1. This is because the lastKey is going to be first key in the
next block. So there won't be a collision with it and the midkey will really represent the
mid of first and last.
> Split operation without split key incorrectly finds the middle key in off-by-one error
> --------------------------------------------------------------------------------------
>                 Key: HBASE-7342
>                 URL: https://issues.apache.org/jira/browse/HBASE-7342
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile, io
>    Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
>            Reporter: Aleksandr Shulman
>            Assignee: Aleksandr Shulman
>            Priority: Minor
>             Fix For: 0.96.0, 0.94.4
>         Attachments: HBASE-7342-v1.patch, HBASE-7342-v2.patch
> I took a deeper look into issues I was having using region splitting when specifying
a region (but not a key for splitting).
> The midkey calculation is off by one and when there are 2 rows, will pick the 0th one.
This causes the firstkey to be the same as midkey and the split will fail. Removing the -1
causes it work correctly, as per the test I've added.
> Looking into the code here is what goes on:
> 1. Split takes the largest storefile
> 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides
as blockKeys[i]
> 3. Getting the middle root-level index should yield the key in the middle of the storefile
> 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset
> 5. In a result with where there are only 2 blockKeys, this yields the 0th block key.

> 6. Unfortunately, this is the same block key that 'firstKey' will be.
> 7. This yields the result in HStore.java:1873 ("cannot split because midkey is the same
as first or last row")
> 8. Removing the -1 solves the problem (in this case). 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message