cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Hardouin <romainh...@yahoo.fr>
Subject Repair: huge boost on C* 2.1 with CASSANDRA-12580
Date Fri, 14 Oct 2016 19:15:08 GMT
Hi all,

Many people here have troubles with repair so I would like to share my experience regarding
the backport of CASSANDRA-12580 "Fix merkle tree size calculation" (thanks Paulo!) in our
C* 2.1.16. I was expecting some minor improvements but the results are impressive on some
tables.

Because of a slow VPN between our EU and US AWS DCs, the massive drop of overstreaming is
a big win for us. On top of that, before the backport I used to see many RepairException that
increased during each repair. With this fix the graph shows only one exception on one node,
so we can say it's negligible. Such exceptions are not critical because Cassandra-reaper makes
a retry but it's a waste of time.


I run a repair on tables set by set (some sets of tables being more critical, etc.).
The most impressive result so far for a set is:
* Before: 23 days (days, not hours)
* With CASSANDRA-12580: 16 hours (yes, hours!)

The improvement is not always dramatic (e.g. 8 hours instead of 39 hours on another set) but
still significant and valuable.

Moreover, considering that:
* repair is a mandatory operation in many use cases
* Paulo already made the patch for 2.1
* C* 2.1 is widely used (the most used?)
I think this bugfix is critical - from an Ops point of view - and should land in 2.1.17 to
be available to people that don't deploy from sources.

Best,

Romain

Mime
View raw message