cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6557) CommitLogSegment may be duplicated in unlikely race scenario
Date Fri, 17 Jan 2014 04:54:19 GMT


Jonathan Ellis commented on CASSANDRA-6557:

Just to make sure we're talking about the same problem:

Say I have allocatingFrom=X, active = [X, Y], and available=[Z].

Someone calls advanceAllocatingFrom.  We iterate over available, and try to swap the first
element into allocatingFrom.

So now we have 
old = X
aF = Z
active = [X, Y]
available = [Z]

Our next step is to remove Z from available and add it to active, but if another thread calls
advanceAllocatingFrom before that happens, the initial thread running aAF and the other will
both add Z to active, so we'll have [X, Y, Z, Z] which is a violation of our design.

It seems to me that we can fix this much more easily by simply dequeuing the element from
available before trying to CAS.  If we fail, we can add it back.  This is a relatively rare
operation, and a race even more rare, so it doesn't have to be super optimized.

> CommitLogSegment may be duplicated in unlikely race scenario
> ------------------------------------------------------------
>                 Key: CASSANDRA-6557
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 2.1
>            Reporter: Benedict
>             Fix For: 2.1
> In the unlikely event that the thread that switched to a new CLS has not finished executing
the cleanup of its switch by the time the CLS has finished being used, it is possible for
the same segment to be 'switched' in again. This would be benign except that it is added to
the activeSegments queue a second time also, which would permit it to be recycled twice, creating
two different CLS objects in memory pointing to the same CLS on disk, after which all bets
are off.
> The issue is highly unlikely to occur, but highly unlikely means it will probably happen
eventually. I've fixed this based on my patch for CASSANDRA-5549, using the NonBlockingQueue
I introduce there to simplify the logic and make it more obviously correct.

This message was sent by Atlassian JIRA

View raw message