hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7763) Compactions not sorting based on size anymore.
Date Wed, 06 Feb 2013 21:41:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572862#comment-13572862

Sergey Shelukhin commented on HBASE-7763:

1) The book seems to disagree about seqNum:
"To overwrite an existing value, do a put at exactly the same row, column, and version as
that of the cell you would overshadow."
"If multiple writes to a cell have the same version, are all versions maintained or just the
last? ... Currently, only the last written is fetchable."

Although maybe this is not a big deal to change, maybe someone else can comment. I was previously
assuming this is important when thinking about compactions.

2) Any particular reason to run two iterations of selection? Can it run until it stops compacting
or gets the number of files to same baseline? Also, -6k/-4k files is hard to judge about baseline.

3) +1 on taking the smallest files in case of max files limitation.

4) In your 100000000-900-1 example I would argue that 900 and 1 files are similar, in light
of the 100000000 file. This is really a question of whether you want more I/O, more files
on average, but smaller compactions; or less I/O and less files but large compactions. I am
not an expert customer scenarios, I wonder if L/R be configurable?
Also, Facebook was trying to solve similar problem with tier-based compaction (HBASE-6371,
HBASE-7055) where files would be selected based on their characteristics; for example size.

> Compactions not sorting based on size anymore.
> ----------------------------------------------
>                 Key: HBASE-7763
>                 URL: https://issues.apache.org/jira/browse/HBASE-7763
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.6
>         Attachments: HBASE-7763-trunk-TESTING.patch, HBASE-7763-trunk-TESTING.patch,
> Currently compaction selection is not sorting based on size.  This causes selection to
choose larger files to re-write than are needed when bulk loads are involved.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message