cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8383) Memtable flush may expire records from the commit log that are in a later memtable
Date Tue, 02 Dec 2014 21:12:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232130#comment-14232130
] 

Ariel Weisberg commented on CASSANDRA-8383:
-------------------------------------------

Does this deserve a regression test? I almost wish ReplayPosition implemented method wrappers
for GT, GTE, LT, LTE, rather then using compareTo. For me there is mental overhead in parsing
that kind of condition.

If I understand correctly if this race occurs and the writing thread loses it will be kicked
forward to the next memtable despite the fact that the op group says it could go into the
current memtable.

So for a memtable to accept a write (either no barrier must exist || the barrier exists but
is after the op group) && if a last replay position is set it must be >= the replay
position of the write
If it is not set the replay position will be updated by the writer so the flusher gets the
position of the last write to the memtable correctly.
If the replay position is finalized even though the op group says that the write could go
into this memtable it is kicked into the next one which is harmless and op order still works
since it chains dependencies in order.

In effect the last replay position is frozen earlier so that when the second op group is created
and starts interleaving in the CL anything beyond the frozen position is not considered for
truncation after the memtable flushes.

I think this does what I just said and I think that fixes the problem that is described where
upon create of the next op group CL entries from different op groups interleave with the truncation
point used for the CL. Freezing the truncation point before creating the second op group solves
the problem.

> Memtable flush may expire records from the commit log that are in a later memtable
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: commitlog
>             Fix For: 2.1.3
>
>
> This is a pretty obvious bug with any care of thought, so not sure how I managed to introduce
it. We use OpOrder to ensure all writes to a memtable have finished before flushing, however
we also use this OpOrder to direct writes to the correct memtable. However this is insufficient,
since the OpOrder is only a partial order; an operation from the "future" (i.e. for the next
memtable) could still interleave with the "past" operations in such a way that they grab a
CL entry inbetween the "past" operations. Since we simply take the max ReplayPosition of those
in the past, this would mean any interleaved future operations would be expired even though
they haven't been persisted to disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message