cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-1546) (Yet another) approach to counting
Date Mon, 04 Oct 2010 09:13:35 GMT


Sylvain Lebresne commented on CASSANDRA-1546:

bq. You're right that there is a problem w/ AE repair. Last time I read the 0.7 code, AE repair
works like so:

Actually, I do think that AE repair are fine in current patch. The idea is the
following: for a counter, each host has its specific LocalCounterColumn named
after the node ip. The node should be the only one to insert such columns
(because those columns merge there values, so only one on each new counter
increment should be created). The patch achieve this in (columns)
deserialization. Eeach node that deserialize a LocalCounterColumn will
deserialize it as a CounterColumn (that acts as a normal column except that it
has no effect on the node this column originate from) unless this is its
LocalCounterColumn. So no host can send us back one of our LocalCounterColumn
since it will never see it as a LocalCounterColumn. It is true that because of
streaming, a node can have a LocalCounterColumn for another host in one of its
sstable. But as soon as it deserialize it, it will become a (non dangerous)
CounterColumn.  So, as long a we don't stream back the exact same sstable that
we have stream in (which we never do to my knowledge), nothing's broken. Feel
free to correct me of course.

Apart from this, I'm attaching an updated version of the patch (that i'm
calling v2 to keep some history). It includes two important updates:

  * It fixes (I hope) a fairly nasty race condition (that I think #1072 also
    suffers from btw). The idea is that since we merge the values of
    LocalCounterColumns, we should never, during a read, merge twice the same
    exact column (the risk being to return a wrong result). Even though this
    usually doesn't happen, this could happen in the following situation:
      *# When we switch memtables
      *# when a memtable that has been fully flushed becomes an active sstable 
      *# At the end of a compaction, when the newly created sstable becomes
      *# Any other place I'm forgetting
    This could happen because those operations aren't atomic (for reads) and
    thus there a small delay during which we could read twice from the same
    memtable/sstable. The patch introduces a new readWriteLock that should
    solve this. The changes are fully in and for those
    interested (and I'll be happy to have some feedback, cause I'm not sure
    how to test this).

  * It moves the handling of the 'marker columns' to a different (and optional)
    CF. Having done this, I've removed the counter-as-row option, so the patch
    only have counters-as-superColumns, which in turns simplify the thrift API,
    making the api for counters more similar to the rest of the API.
    I want to mention that the marker idea is still work in progress and I'm
    trying some changes. So feel free to ignore it on a first read (it's fully
    optional anyway). But the overall goal I'm pursuing with this, is to be
    able to replay a update when we don't know if it made it in. This is
    clearly hard if you want to be partition tolerant and not incur too much
    overhead on normal (non failing) operations.  So I'm leaning towards a
    system where a replayed update could mean temporal inconsistency, but will
    ensure that eventually, the count will be right. But more on this later.

> (Yet another) approach to counting
> ----------------------------------
>                 Key: CASSANDRA-1546
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 0.7.0
>         Attachments: 0001-Remove-IClock-from-internals.patch, 0002-Counters.patch, 0003-Generated-thrift-files-changes.patch
> This could be described as a mix between CASSANDRA-1072 without clocks and CASSANDRA-1421.
> More details in the comment below.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message