cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11412) Many sstablescanners opened during repair
Date Mon, 11 Apr 2016 23:52:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236263#comment-15236263
] 

Paulo Motta commented on CASSANDRA-11412:
-----------------------------------------

Code and tests look good and this is definitely a big improvement from what we had before,
but as mentioned by [~molsson] we would still need to have 1 {{ISSTableScanner}} instance
per sstable open during the repair process for the non-LCS case. Do you think we should worry
about optimizing this further by lazily opening {{ISSTableScanners}} as partitions are iterated?

Building up from  [~molsson] suggestion, I thought we could change {{AbstractCompactionStrategy.getScanners(sstables,
ranges)}} to return a {{RangeScannerIterator}} instead, that returns a list of overlapping
scanners at each iteration. This iterator would have an {{OrderedMap<Range<Token>,
Set<SSTableReader>>}} with a set of overlapping sstables for each exclusive subrange,
and lazily instantiate {{ISSTableScanner}} as it iterates the subranges, maybe reusing {{ISSTableScanner}}
from previous iterations and discarding them when no longer needed.

We would then need to create a new {{UnfilteredPartitionIterator}} to be used during compaction
that would operate over {{RangeScannerIterator}} instances, merging returned {{ISSTableScanners}}
for each exclusive subrange and renewing the merge iterator after the previous merge iterator
is exhausted.

Benefit is that we would keep a minimum amount of {{ISSTableScanner}} instances open during
compaction, avoiding things like CASSANDRA-4142 and we would have a single solution for both
LCS and non-LCS. Downside is probably increased complexity and maybe overhead for building
exclusive subranges.

Do you think this would work and is worth it? If so, should we do it here or open a new ticket
for it?

> Many sstablescanners opened during repair
> -----------------------------------------
>
>                 Key: CASSANDRA-11412
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11412
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>             Fix For: 3.0.x, 3.x
>
>
> Since CASSANDRA-5220 we open [one sstablescanner per range per sstable|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L374].
If compaction gets behind and you are running vnodes with 256 tokens and RF3, this could become
a problem (ie, {{768 * number of sstables}} scanners)
> We could probably refactor this similar to the way we handle scanners with LCS - only
open the scanner once we need it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message