I was wondering: is (2) a direct consequence of a repair on the full
token range (and thus anti-compaction ran only on a subset of the RF
nodes)?. If I understand correctly, a repair with -pr should fix this,
at the cost of all nodes performing the anticompaction phase?
On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani <firstname.lastname@example.org> wrote:
> Didn't know about (2), and I actually have a time drift between the nodes.
> Thanks a lot Paulo!
> On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta <email@example.com>
>> There are a couple of things that could be happening here:
>> - There will be time differences between when nodes participating repair
>> flush, so in write-heavy tables there will always be minor differences
>> during validation, and those could be accentuated by low resolution merkle
>> trees, which will affect mostly larger tables.
>> - SSTables compacted during incremental repair will not be marked as
>> repaired, so nodes with different compaction cadences will have different
>> data in their unrepaired set, what will cause mismatches in the subsequent
>> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation.
>> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani <firstname.lastname@example.org>:
>>> I am seeing something weird while running repairs.
>>> I am testing 3.0.9 so I am running the repairs manually, node after node,
>>> on a cluster with RF=3. I am using a standard repair command (incremental,
>>> parallel, full range), and I just noticed that the third node detected some
>>> ranges out of sync with one of the nodes that just finished repairing.
>>> Since there was no dropped mutation, that sounds weird to me considering
>>> that the repairs are supposed to operate on the whole range.
>>> Any idea why?
>>> Maybe I am missing something?