cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Hegerfors (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9572) DateTieredCompactionStrategy fails to combine SSTables correctly when TTL is used.
Date Thu, 11 Jun 2015 10:07:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581760#comment-14581760
] 

Björn Hegerfors commented on CASSANDRA-9572:
--------------------------------------------

Looks like the right solution to this (except refactoring to avoid calling getFullyExpiredSSTables
twice). But why is the sort still there (line 115/120)? It's redundant since CASSANDRA-8243,
and was removed in 2.1+.

> DateTieredCompactionStrategy fails to combine SSTables correctly when TTL is used.
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9572
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Antti Nissinen
>            Assignee: Marcus Eriksson
>              Labels: dtcs
>             Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x
>
>         Attachments: cassandra_sstable_metadata_reader.py, cassandra_sstable_timespan_graph.py,
compaction_stage_test01_jira.log, compaction_stage_test02_jira.log, datagen.py, explanation_jira.txt,
first_results_after_patch.txt, motivation_jira.txt, src_2.1.5_with_debug.zip
>
>
> DateTieredCompaction works correctly when data is dumped for a certain time period in
short SSTables in time manner and then compacted together. However, if TTL is applied to the
data columns the DTCS fails to compact files correctly in timely manner. In our opinion the
problem is caused by two issues:
> A) During the DateTieredCompaction process the getFullyExpiredSStables is called twice.
First from the DateTieredCompactionStrategy class and second time from the CompactionTask
class. On the first time the target is to find out fully expired SStables that are not overlapping
with any non-fully expired SSTables. That works correctly. When the getFullyExpiredSSTables
is called second time from CompactionTask class the selection of fully expired SSTables is
modified compared to the first selection.
> B) The minimum timestamp of the new SSTables created by combining together fully expired
SSTable and files from the most interesting bucket is not correct.
> These two issues together cause problems for the DTCS process when it combines together
SSTables having overlap in time and TTL for the column. This is demonstrated by generating
test data first without compactions and showing the timely distribution of files. When the
compaction is enabled the DCTS combines files together, but the end result is not something
to be expected. This is demonstrated in the file motivation_jira.txt
> Attachments contain following material:
> - Motivation_jira.txt: Practical examples how the DTCS behaves with TTL
> - Explanation_jira.txt: gives more details, explains test cases and demonstrates the
problems in the compaction process
> - Logfile file for the compactions in the first test case (compaction_stage_test01_jira.log)
> - Logfile file for the compactions in the seconnd test case (compaction_stage_test02_jira.log)
> - source code zip file for version 2.1.5 with additional comment statements (src_2.1.5_with_debug.zip)
> - Python script to generate test data (datagen.py)
> - Python script to read metadata from SStables (cassandra_sstable_metadata_reader.py)
> - Python script to generate timeline representation of SSTables (cassandra_sstable_timespan_graph.py)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message