cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-2105) Fix the read race condition in CFStore for counters
Date Fri, 04 Feb 2011 16:58:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-2105:
----------------------------------------

    Attachment: 2115_option2_nolock.patch
                2115_option1_withLock.patch

Attached not 1 but 2 options for this patch. I'm not sure with which version to go so I'm
asking for opinions.

Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect from the race
condition.

Version 2 don't use a lock. So less chances of lock contention which is always good. Only
problem is, it still suffers in theory of a race condition. But I think this race condition
is borderline impossible.
Basically, given a memtable m being flushed, let's call s(m) the sstable initially produced
by its flushing and let's denote by s'(m) any sstable resulting of the compaction of s(m).
The race is if a read thread sees m when grabbing the references to the memtable being flushed
and sees s'(m) (not s(m), that is the initial race condition and this is not impossible at
all) when grabing the reference to the sstables.
If it's unclear, the code has a comment explaining this that may be more clear.

So not sure which version to go with. I may slightly lean towards Version 1 because I usually
side with correction before anything else, but since this is in a critical path it feels slightly
wasteful to use a lock for this given how remote the race condition of version 2 seems.


> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message