cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-4341) Small SSTable Segments Can Hurt Leveling Process
Date Fri, 22 Jun 2012 11:23:43 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-4341:
----------------------------------------

    Attachment: 4341-fix.txt

This patch might make leveled compaction be stuck in an infinite compaction loop if compaction
is used and no more data comes in.

The problem is that if you have say 2 sstable in L0, but those are not bigger than sstableMaxSize,
we will compact them in L0 but we might end up with 2 sstable in L0 instead of 1. Now the
reason this can happen is due to another problem older than this patch. That problem is that
when leveled compacts, it splits sstables at sstableMaxSize of *uncompressed* data. However
LeveledManifest (the patch on this ticket included) consider the level sizes to be *on-disk
sizes*. So 2 sstables can be less than 10MB of on-disk size, but when compacting them, they
will still generate 2 sstables because the uncompressed size is > 10 MB.

In theory there is 2 possible fixes for that:
# when we compact, consider the on-disk size to split sstables.
# in LeveledManifest, consider level size in uncompressed data size instead of on-disk size.

I think the first solution is closer to the initial intention in that we want file on disk
to be what the user sets with sstableMaxSize. Besides, doing the 2nd solution means that we
would artificially augment the size of all level, which would make the upgrade a bit painful
since it would generate a lot of compaction to re-equilibrate levels.  So attaching patch
that does the first idea. (I note that because our sequentialWriter buffer data before writing
them, getting the on-disk file pointer give us a position aligned on buffer size, but I don't
thing that matters in that case, except that it makes it an error to have a SequentialWriter
buffer size > compression block size).

There was 2 other problem with the committed patch:
* The edge case where compaction candidates in L0 were exactly of sstableMaxSize was not handled
correctly in that the candidates would not be compacted with L1 sstable but would still be
promoted.
* In the case where we had MAX_COMPACTING_L0 candidates, the code wasn't adding the overlapping
sstable from L1.
The attached patch fixes that too.

                
> Small SSTable Segments Can Hurt Leveling Process
> ------------------------------------------------
>
>                 Key: CASSANDRA-4341
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4341
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Benjamin Coverston
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: compaction
>             Fix For: 1.1.2
>
>         Attachments: 4341-fix.txt, 4341.txt
>
>
> This concerns:
> static int MAX_COMPACTING_L0 = 32;
> Repair can create very small SSTable segments. We should consider moving to a threshold
that takes into account the size of the files brought into compaction rather than the number
of files for this and similar situations. Bringing the small files from L0 to L1 magnifies
the issue.
> If there are too many very small files in L0 perhaps even an intermediate compaction
would even reduce the magnifying effect of a L0 to L1 compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message