hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
Date Mon, 19 Apr 2010 22:55:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858713#action_12858713

Todd Lipcon commented on HBASE-2462:

We talked about this a bit at the hackathon today.

One idea about the heuristic is to measure the actual cost of having multiple storefiles for
a region (this was discussed a bit in HBASE-2457). The overall cost of having a lot of files
is the cost of hitting HFiles for reads. We can easily measure this - whenever we access a
store for a read/scan, we should increment a counter for that store based on how much time
we spent accessing it. We can use this data in a number of ways:
- When deciding which files to compact, we know the "cost" of each file - if a file has a
large cost, then including it in the compaction is worth a lot. If it has a small cost, we
won't gain much by compacting it. We can weigh the cost vs the size of the file - if it has
been costing us very little, but it's a big file, it's not worth compacting.
- We can also divide the sum of the costs by the number of reads - this gives us an "effective
number of store files". For example, if we have 10 store files, but 5 of them are completely
in block cache, then we effectively only have 5 store files from the standpoint of the benefit
of a compaction. We can use this to prioritize compactions that will actually be helpful.

> Review compaction heuristic and move compaction code out so standalone and independently
> -------------------------------------------------------------------------------------------------
>                 Key: HBASE-2462
>                 URL: https://issues.apache.org/jira/browse/HBASE-2462
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.5, 0.21.0
> Anything that improves our i/o profile makes hbase run smoother.  Over in HBASE-2457,
good work has been done already describing the tension between minimizing compactions versus
minimizing count of store files.  This issue is about following on from what has been done
in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone
class that can be the easier tested (and easily analyzed for its performance characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge sort implementations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message