cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1967) commit log replay shouldn't end with a flush
Date Tue, 11 Jan 2011 23:20:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980424#action_12980424
] 

Jonathan Ellis commented on CASSANDRA-1967:
-------------------------------------------

The main reason to flush after replay is that it means you never have to replay the data you
just did, again.

Every once in a while we have someone with excessively large memtable thresholds OOM himself
during replay.  I'd actually like to flush after replaying each segment, so that as long as
you can finish one segment before OOMing you'll make progress.

The problem isn't flushing per se (if you're 90% full, it's immaterial if you flush now or
in two minutes of write load), but rather flushing mostly-empty sstables that still count
towards compaction threshold.

Perhaps introducing a "don't bother compacting if one sstable is < X% of the next-smallest"
rule would fix this.

> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>
> (Apologies in advance if there is some very compelling reason to flush after replay,
of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction.
As a node is likely to struggle performance-wise after restarting, triggering compaction at
that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush
is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message