cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-6503) sstables from stalled repair sessions become live after a reboot and can resurrect deleted data
Date Thu, 23 Jan 2014 13:43:38 GMT


Jason Brown updated CASSANDRA-6503:

    Attachment: 6503_2.0-v2.diff

Attached v2 patch has the following changes:

- Changed StreamReceiveTask to keep a collection of SSTW rather than SSTR. This allows us
to do the conversion of SSTW to SSTR all together after we've gotten all the streamed files.
Also fixed up the code paths to here so they pass SSTW.

- Also in StreamReceiveTask, added an abort() method, which will discard the SSTWs it has
buffered up. Changed StreamSession so that when a session ends in failure, it calls the new
STR.abort() method.

- Split FileMessage out into IncomingFileMessage and OutgoingFileMessage. I needed to do this
since as each one has a different subclass of SSTable, but also because java generics doesn't
allow me to return different subclasses from StreamMessage.Serializer<V extends StreamMessage>.
This necessitated the changes in StreamMessage as I couldn't have one serializer for both
IncomingFileMessage and OutgoingFileMessage.  As it didn't seem best to create a new StreamMessage.Type
(something like FILE_IN and FILE_OUT) just to represent the FILE message type's behavior on
inbound vs. outbound, I instead split the SM.Type.serializer into two variables: inSerializer
and outSerializer. For all the other Type's, the in and out serializers are the same class;
in the case of Type.FILE, this is where I'm referencing IncomingFileMessage.serializer and
OutgoingFileMessage.serializer, respectively. This seemed the cleanest way to introduce the
now-bifurcated life of Type.FILE/FileMessage.

- added StreamLockfile to satisfy [~yukim]'s request for a mechanism to remove, on restart,
the subset of SSTRs that were successfully converted when others from it's stream session
failed. Assumes the process crashed in the middle of converting the SSTWs to SSTRs.

In the first patch, I chose to write the lockfile out to the commitlog directory. I did this
as it seems like overkill to add another yaml setting (and Config/DD change) just for this
value. Thus, I wanted to piggyback off something else that we already have, and DD.getCommitLogDirectory
seemed the least worst. I'm open to suggestions on this.

Once these changes are incorporated into 2.0 and trunk, I would still like to do something
for 1.2 but I do not think we need to be as extensive as what we're doing for 2.0+. Perhaps
leave out the lockfile and the abort(), and just leave the deferring of converting SSTW to
SSTR until the end of the session (basically what the current 1.2 patch does, but I'll check
it out again after the 2.0 stuff is good).

> sstables from stalled repair sessions become live after a reboot and can resurrect deleted
> -----------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-6503
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 1.2.14, 2.0.5
>         Attachments: 6503_2.0-v2.diff, 6503_c1.2-v1.patch
> The sstables streamed in during a repair session don't become active until the session
finishes.  If something causes the repair session to hang for some reason, those sstables
will hang around until the next reboot, and become active then.  If you don't reboot for 3
months, this can cause data to resurrect, as GC grace has expired, so tombstones for the data
in those sstables may have already been collected.

This message was sent by Atlassian JIRA

View raw message