hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8765) split should be based on store size, not HFile size
Date Wed, 19 Jun 2013 00:14:20 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HBASE-8765:
------------------------------------

    Description: 
I noticed that the current split behavior is rather suboptimal with regard to compactions.
On large regions, HFile size limit triggers a split. Split is followed by major compaction
to get rid of the partial reference files. However, HFile size limit is surpassed after compaction
most of the time.
So, first we rewrite a lot of data into a new file. Then we say "Oh look! A large file!",
split the region and rewrite everything again.

Perhaps region split should be based on store size, or incoming compaction size - large enough
compaction should be converted into splits.

Thoughts? I think basing off store size is a simple fix, and will code it up soon if there
are no objections

  was:
I noticed that the current split behavior is rather suboptimal with regard to compactions.
On large regions, HFile size limit triggers a split. Split is followed by major compaction
to get rid of the partial reference files. However, HFile size limit is surpassed after compaction
most of the time.
So, first we rewrite a lot of data into a new file. Then we say "Oh look! A large file!",
split the region and rewrite everything again.

Perhaps region split should be based on region size, or incoming compaction size - large enough
compaction should be converted into splits.

Thoughts? I think basing off region size is a simple fix, and will code it up soon if there
are no objections

    
> split should be based on store size, not HFile size
> ---------------------------------------------------
>
>                 Key: HBASE-8765
>                 URL: https://issues.apache.org/jira/browse/HBASE-8765
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.95.1
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> I noticed that the current split behavior is rather suboptimal with regard to compactions.
On large regions, HFile size limit triggers a split. Split is followed by major compaction
to get rid of the partial reference files. However, HFile size limit is surpassed after compaction
most of the time.
> So, first we rewrite a lot of data into a new file. Then we say "Oh look! A large file!",
split the region and rewrite everything again.
> Perhaps region split should be based on store size, or incoming compaction size - large
enough compaction should be converted into splits.
> Thoughts? I think basing off store size is a simple fix, and will code it up soon if
there are no objections

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message