hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5330) TestCompactSelection - adding 2 test cases to testCompactionRatio
Date Mon, 06 Feb 2012 16:10:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201362#comment-13201362

Nicolas Spiegelberg commented on HBASE-5330:

I spent a little time on this yesterday.  This is correct behavior as written.  Some detail:
// Change
compactEquals(store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact(), 7,6,5,4,3);
// TO:
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);

The original code is doing a compaction, taking the output files, then doing a second compaction
on them.  Obviously, this is an identity operation, but is not technically correct since we're
"double compacting".

store.forceMajor = true;
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
Should return [3:7] because it's NOT actually doing a major compaction.  Currently, the algorithm
states that Majors with too many files are downgraded.  This is not really the behavoir we
want.  Instead, for a major compaction, we should try to compact storefiles[0:N] where N >=
min(minFiles, sizeof(storefiles)).  This will be a little tricky, because the candidate files
don't always contain storefile[0], which is necessary for compaction.

// Reference compaction
compactEquals(sfCreate(true, 7, 6, 5, 4, 3, 2, 1), 5, 4, 3, 2, 1);

This is correct as written, but still needs some improvement.  As I recall, the original reasoning
was that we'd only hit this case when we had a bug where we kept flushing storefiles.  We
weren't sure how to handle it at the time (we had prod pressure).  The problem is that we
didn't have the state of previous compactions & we thought we'd have to get the whole
candidate set.  The idea was that, if we're going to recompact the same files multiple times,
it should be the smaller files at the end rather than the last file.  Since we only need a
shard of the files for major compaction and reference files keep inherent state, we can improve
> TestCompactSelection - adding 2 test cases to testCompactionRatio
> -----------------------------------------------------------------
>                 Key: HBASE-5330
>                 URL: https://issues.apache.org/jira/browse/HBASE-5330
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Assignee: Doug Meil
>            Priority: Minor
>         Attachments: TestCompactSelection_hbase_5330.java.patch
> There were three existing assertions in TestCompactSelection testCompactionRatio that
did "max # of files" assertions...
> {code}
>     assertEquals(maxFiles,
>         store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact().size());
> {code}
> ... and for references ...
> {code}
>   assertEquals(maxFiles,
>         store.compactSelection(sfCreate(true, 7,6,5,4,3,2,1)).getFilesToCompact().size());
> {code}
> ... but they didn't assert against which StoreFiles got selected.  While the number of
StoreFiles is the same, the files selected are actually different, and I thought that there
should be explicit assertions showing that.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message