hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8765) split should be based on store size, not HFile size
Date Wed, 19 Jun 2013 00:14:19 GMT
Sergey Shelukhin created HBASE-8765:

             Summary: split should be based on store size, not HFile size
                 Key: HBASE-8765
                 URL: https://issues.apache.org/jira/browse/HBASE-8765
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.95.1
            Reporter: Sergey Shelukhin
            Assignee: Sergey Shelukhin

I noticed that the current split behavior is rather suboptimal with regard to compactions.
On large regions, HFile size limit triggers a split. Split is followed by major compaction
to get rid of the partial reference files. However, HFile size limit is surpassed after compaction
most of the time.
So, first we rewrite a lot of data into a new file. Then we say "Oh look! A large file!",
split the region and rewrite everything again.

Perhaps region split should be based on region size, or incoming compaction size - large enough
compaction should be converted into splits.

Thoughts? I think basing off region size is a simple fix, and will code it up soon if there
are no objections

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message