cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omid Aladini <>
Subject Is Anti Entropy repair idempotent with respect to transferred data?
Date Tue, 16 Oct 2012 09:33:22 GMT

I was wondering if streamed data via Anti Entropy repair is idempotent
with respect to fixed set of data and convergent with respect to
mutating set of data, meaning that:

- Given there are 0 mutations going on on the cluster and we run
repair multiple times, data would only be transferred the first time
(meaning that Merkle trees would be equal after applying repair once.)

- In case we have mutations on the cluster, and we run repair multiple
times, the amount of data transferred on each repair is proportional
to the size of lost messages since the last repair AND the fact that
different nodes take the snapshot at slightly different times (to
build Merkle tree on).

In my experience running repair on some counter data, the size of
streamed data is much bigger than the cluster could possibly have lost
messages or would be due to snapshotting at different times.

I know the data will eventually be in sync on every repair, but I'm
more interested in whether Cassandra transfers excess data and how to
minimize this.

Does any body have insights on this?


View raw message