hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (HBASE-14867) SimpleRegionNormalizer needs to have better heuristics to trigger merge operation
Date Mon, 04 Jan 2016 09:31:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Enis Soztutar reopened HBASE-14867:

Reopening this since I think the patch has a problem. 

This logic: 
    while (candidateIdx < tableRegions.size()-1) {
      if (Math.abs(regionsWithSize.get(candidateIdx).getThird() -
        regionsWithSize.get(candidateIdx + 1).getThird()) == 1) {
only looks for regions who are neighbors and also their region sizes are neighbors in the
regionsWithSize sorted array. This does not seem like it will be useful in real world region

I think we should look for all neighboring region pairs and decide to split or merge for all
possible pairs. What about the other suggestions in my first comment? 

> SimpleRegionNormalizer needs to have better heuristics to trigger merge operation
> ---------------------------------------------------------------------------------
>                 Key: HBASE-14867
>                 URL: https://issues.apache.org/jira/browse/HBASE-14867
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.2.0
>            Reporter: Romil Choksi
>            Assignee: Ted Yu
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>         Attachments: 14867-v2.txt, 14867-v3.txt, 14867-v4.txt
> SimpleRegionNormalizer needs to have better heuristics to trigger merge operation. SimpleRegionNormalizer
is not able to trigger a merge action if the table's smallest region has neighboring regions
that are larger than table's average region size, whereas there are other smaller regions
whose combined size is less than the average region size. 
> For example, 
> - Consider a table with six region, say r1 to r6. 
> - Keep r1 as empty and create some data say, 100K rows of data for each of the regions
r2, r3 and r4. Create smaller amount of data for regions r5 and r6, say about 27K rows of
> - Run the normalizer. Verify the number the regions for that table and also check the
master log to see if any merge action was triggered as a result of normalization. 
> In such scenario, it would be better to have a merge action triggered for those two smaller
regions r5 and r6 even though either of them is not the smallest one

This message was sent by Atlassian JIRA

View raw message