> is (2) a direct consequence of a repair on the full token range (and thus anti-compaction ran only on a subset of the RF nodes)?

Not necessarily, because even with -pr enabled the nodes will be responsible for different ranges, so they will flush and compact at different instants. The effect of this on long running repairs is that data that was marked as repaired in one replica, may be compacted in some other replica, causing it to not be marked as repaired due to CASSANDRA-9143, what will cause a mismatch in the next repair. This could probably be alleviated by CASSANDRA-6696.

2016-10-03 12:16 GMT-03:00 Stefano Ortolani <ostefano@gmail.com>:
I was wondering: is (2) a direct consequence of a repair on the full
token range (and thus anti-compaction ran only on a subset of the RF
nodes)?. If I understand correctly, a repair with -pr should fix this,
at the cost of all nodes performing the anticompaction phase?


On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani <ostefano@gmail.com> wrote:
> Didn't know about (2), and I actually have a time drift between the nodes.
> Thanks a lot Paulo!
> Regards,
> Stefano
> On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta <pauloricardomg@gmail.com>
> wrote:
>> There are a couple of things that could be happening here:
>> - There will be time differences between when nodes participating repair
>> flush, so in write-heavy tables there will always be minor differences
>> during validation, and those could be accentuated by low resolution merkle
>> trees, which will affect mostly larger tables.
>> - SSTables compacted during incremental repair will not be marked as
>> repaired, so nodes with different compaction cadences will have different
>> data in their unrepaired set, what will cause mismatches in the subsequent
>> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation.
>> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani <ostefano@gmail.com>:
>>> Hi,
>>> I am seeing something weird while running repairs.
>>> I am testing 3.0.9 so I am running the repairs manually, node after node,
>>> on a cluster with RF=3. I am using a standard repair command (incremental,
>>> parallel, full range), and I just noticed that the third node detected some
>>> ranges out of sync with one of the nodes that just finished repairing.
>>> Since there was no dropped mutation, that sounds weird to me considering
>>> that the repairs are supposed to operate on the whole range.
>>> Any idea why?
>>> Maybe I am missing something?
>>> Cheers,
>>> Stefano