cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <rpinchb...@tripadvisor.com>
Subject Re: Seeing tons of DigestMismatchException exceptions after upgrading from 2.2.13 to 3.11.4
Date Tue, 10 Dec 2019 16:28:12 GMT
Colleen, to your question, yes there is a difference between 2.x and 3.x that would impact
repairs.  The merkel tree computations changed, to having a default tree depth that is greater.
That can cause significant memory drag, to the point that nodes sometimes even OOM.  This
has been fixed in 4.x to make the setting tunable.  I think 3.11.5 now contains the same as
a back-patch.

From: Reid Pinchback <rpinchback@tripadvisor.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, December 10, 2019 at 11:23 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Seeing tons of DigestMismatchException exceptions after upgrading from 2.2.13
to 3.11.4

Message from External Sender
Carl, your speculation matches our observations, and we have a use case with that unfortunate
usage pattern.  Write-then-immediately-read is not friendly to eventually-consistent data
stores. It makes the reading pay a tax that really is associated with writing activity.

From: Carl Mueller <carl.mueller@smartthings.com.INVALID>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, December 9, 2019 at 3:18 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Seeing tons of DigestMismatchException exceptions after upgrading from 2.2.13
to 3.11.4

Message from External Sender
My speculation on rapidly churning/fast reads of recently written data:

- data written at quorum (for RF3): write confirm is after two nodes reply
- data read very soon after (possibly code antipattern), and let's assume the third node update
hasn't completed yet (e.g. AWS network "variance"). The read will pick a replica, and then
there is a 50% chance the second replica chosen for quorum read is the stale node, which triggers
a DigestMismatch read repair.

Is that plausible?

The code seems to log the exception in all read repair instances, so it doesn't seem to be
an ERROR with red blaring klaxons, maybe it should be a WARN?

On Mon, Nov 25, 2019 at 11:12 AM Colleen Velo <cmvelo@gmail.com<mailto:cmvelo@gmail.com>>
wrote:
Hello,

As part of the final stages of our 2.2 --> 3.11 upgrades, one of our clusters (on AWS/
18 nodes/ m4.2xlarge) produced some post-upgrade fits. We started getting spikes of Cassandra
read and write timeouts despite the fact the overall metrics volumes were unchanged. As part
of the upgrade process, there was a TWCS table that we used a facade implementation to help
change the namespace of the compaction class, but that has very low query volume.

The DigestMismatchException error messages, (based on sampling the hash keys and finding which
tables have partitions for that hash key), seem to be occurring on the heaviest volume table
(4,000 reads, 1600 writes per second per node approximately), and that table has semi-medium
row widths with about 10-40 column keys. (Or at least the digest mismatch partitions have
that type of width). The keyspace is an RF3 using NetworkTopology, the CL is QUORUM for both
reads and writes.

We have experienced the DigestMismatchException errors on all 3 of the Production clusters
that we have upgraded (all of them are single DC in the us-east-1/eu-west-1/ap-northeast-2
AWS regions) and in all three cases, those DigestMismatchException errors were not there in
either the  2.1.x or 2.2.x versions of Cassandra.
Does anyone know of changes from 2.2 to 3.11 that would produce additional timeout problems,
such as heavier blocking read repair logic?  Also,

We ran repairs (via reaper v1.4.8) (much nicer in 3.11 than 2.1) on all of the tables and
across all of the nodes, and our timeouts seemed to have disappeared, but we continue to see
a rapid streaming of the Digest mismatches exceptions, so much so that our Cassandra debug
logs are rolling over every 15 minutes..   There is a mail list post from 2018 that indicates
that some DigestMismatchException error messages are natural if you are reading while writing,
but the sheer volume that we are getting is very concerning:
 - https://www.mail-archive.com/user@cassandra.apache.org/msg56078.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_user-40cassandra.apache.org_msg56078.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=dwLj6E_WYM8uXYOVXSvTCxWeihgwwGEpbPrvDTOoQ24&s=2QbuYooXdG_wC9dKbsjNzdNLXkbXAW_517Xu7lqhKws&e=>

Is that level of DigestMismatchException unusual? Or is can that volume of mismatches appear
if semi-wide rows simply require a lot of resolution because flurries of quorum reads/writes
(RF3) on recent partitions have a decent chance of not having fully synced data on the replica
reads? Does the digest mismatch error get debug-logged on every chance read repair? (edited)
Also, why are these DigestMismatchException only occurring once the upgrade to 3.11 has occurred?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sample DigestMismatchException error message:
    DEBUG [ReadRepairStage:13] 2019-11-22 01:38:14,448 ReadCallback.java:242 - Digest mismatch:
    org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-6492169518344121155,
66306139353831322d323064382d313037322d663965632d636565663165326563303965) (be2c0feaa60d99c388f9d273fdc360f7
vs 09eaded2d69cf2dd49718076edf56b36)
        at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_77]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_77]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
[apache-cassandra-3.11.4.jar:3.11.4]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]

Cluster(s) setup:
    * AWS region: eu-west-1:
        — Nodes: 18
        — single DC
        — keyspace: RF3 using NetworkTopology

    * AWS region: us-east-1:
        — Nodes: 20
        — single DC
        — keyspace: RF3 using NetworkTopology

    * AWS region: ap-northeast-2:
        — Nodes: 30
        — single DC
        — keyspace: RF3 using NetworkTopology

Thanks for any insight into this issue.

--
Colleen Velo
email: cmvelo@gmail.com<mailto:cmvelo@gmail.com>
Mime
View raw message