cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-3234) LeveledCompaction has several performance problems
Date Wed, 21 Sep 2011 17:46:11 GMT


Sylvain Lebresne updated CASSANDRA-3234:

    Attachment: 0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch

I haven't looked at the 3 first patches, but on patch 4 and 5.

+1 on patch 4 (though I agree with the comment in there that it's not the more beautiful refactor
ever :))

On patch 5, it cloneMeShallow the first read column family and basically skip all the columns,
so that's wrong. Attaching a v2 that makes SSTII directly use the right ISortedColumn factory
(to avoid full cloning). Problem is this doesn't translate to ParallelCompactionIterable too
well since the actual read is deep into the code. For it, I think we have 2 easy solutions:
  * Just use ArraySortedColumns all the way. This is actually ok because addAll works whatever
the input is, it does a merge.
  * Do a full clone to a TreeMapBacked CF on the first cf read
  * Use TreeMapBack CFs all the way.

I went with the first solution in the patch attached (more because it requires the less changes
than anything else), though that's probably not optimal for LeveledCompaction (but I'm not
sure ParallelCompaction is useful for LeveledCompaction). 

> LeveledCompaction has several performance problems
> --------------------------------------------------
>                 Key: CASSANDRA-3234
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0.0
>         Attachments: 0001-optimize-single-source-case-for-MergeIterator.txt, 0002-add-TrivialOneToOne-optimization.txt,
0003-fix-leveled-BF-size-calculation.txt, 0004-avoid-calling-shouldPurge-unless-necessary.txt,
0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch, 0005-use-Array-and-Tree-backed-columns-in-compaction.txt
> Two main problems:
> - BF size calculation doesn't take into account LCS breaking the output apart into "bite
sized" sstables, so memory use is much higher than predicted
> - ManyToMany merging is slow.  At least part of this is from running the full reducer
machinery against single input sources, which can be optimized away.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message