From user-return-64649-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Oct 24 21:11:32 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2830E18065D for ; Thu, 24 Oct 2019 23:11:32 +0200 (CEST) Received: (qmail 22724 invoked by uid 500); 24 Oct 2019 21:11:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22714 invoked by uid 99); 24 Oct 2019 21:11:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2019 21:11:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AFD471A32EC for ; Thu, 24 Oct 2019 21:11:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.303 X-Spam-Level: X-Spam-Status: No, score=0.303 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.2, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=bitbrew-com.20150623.gappssmtp.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id j-3mFmLLDvuQ for ; Thu, 24 Oct 2019 21:11:24 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::344; helo=mail-ot1-x344.google.com; envelope-from=ben@bitbrew.com; receiver= Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 133047DDAC for ; Thu, 24 Oct 2019 21:11:24 +0000 (UTC) Received: by mail-ot1-x344.google.com with SMTP id z6so294253otb.2 for ; Thu, 24 Oct 2019 14:11:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbrew-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=o+TdsuDVpJBKjAbcmOr/OrlBu2PMrrNJnYPBcBA/Rdk=; b=h6YYcXQZ7CC3lRtCahUSehIZs0QPecm7j/9TeObUUS1ZR/sFUCeyzJm4C8JhsHItDb ukq8g+E1CLZ5bkqSrlCLLRe8XOK4iRPZZeTC+CXxAxlfNNMOBM+HwdwiLZWj02SxV3Ni ORnYm+9RVuJy99x2dwF62qkUo+cdO8qVwcIqlXoduZxnoXqHDnpFCgIe8eIuXNRMGo/x Um5rQ2V4PIRLb5vVFJSGIelToyLfLmbiwJ0Pk6ih+N6nASwObd8QdOtO7c0niKmHX2X2 WZ5o6x6Rpf++vFsv2AQm+HdBc9Y5JjDYt+AuZCixacofr1+K0Xfblo2z2eRd5Z6LOtwO Ao7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=o+TdsuDVpJBKjAbcmOr/OrlBu2PMrrNJnYPBcBA/Rdk=; b=f+l5wFEKpgqNsY9l89d0aDs7NK23gfA7oqmOuQ1/thWwSSTuYIl/HVHictjHjSVLPL h9qa+ICM+EVC3W1LBWlYX9q/r3xQY+iOFSrHbKgnvUEGWeZJDVCGYb6obT1BxJoqCPXf WyhMkUjYPPnFWtGBzJ2GwjecZM2jezXlTa5XUrwO2eYxFA3LFJdGnmyLZihEC1safYuc 3WDK/gIaE2KuYRhgg8ikQaiqf7MxDJJE6bjamcz6OhHDE28eJMA3O5nEN4P7o0Zw9Eor 5M795xMJ2R+qxnzNanFCcUOryfAEL5ef2+lEEXnSEH5nYNmuO3WqHdxPfVlYJ7t1AXac rYXw== X-Gm-Message-State: APjAAAWho6M0wfICHIDi6rdvG2niR85DtffX5GoPevW8UhJp6eu7K0VJ AMl+eYcQVSpnEsIkgTyjEgjRl6mEXtYb2aTKSNzyZi3H X-Google-Smtp-Source: APXvYqzxCjLdaQYkz+Arq5zNZGR3PXYohu6FzcrB7VQqTHdCN9jKb20iWmw+NPYfS8v5CQE8yjAu+ilgUDVYiXQ/EHg= X-Received: by 2002:a9d:6043:: with SMTP id v3mr12837538otj.276.1571951482413; Thu, 24 Oct 2019 14:11:22 -0700 (PDT) MIME-Version: 1.0 References: <20BE3886-E651-4747-9603-CB0609B9CE4B@tripadvisor.com> In-Reply-To: <20BE3886-E651-4747-9603-CB0609B9CE4B@tripadvisor.com> From: Ben Mills Date: Thu, 24 Oct 2019 17:11:11 -0400 Message-ID: Subject: Re: Repair Issues To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000b6c74b0595ae7946" --000000000000b6c74b0595ae7946 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Reid, Many thanks - I have seen that article though will definitely give it another read. Note that nodetool scrub has been tried (no effect) and sstablescrub cannot currently be run with the Cassandra image in use (though certainly a new image that allows the server to be stopped but keeps the operating environment available to use this utility can be built - just haven't done so yet). Note also that none of the logs are indicating that a corrupt data file (or files) is in play here. Noting that because the article includes a solution where a specific data file is manually deleted and then repairs are run to restore the file from a different node in the cluster. Also, the way persistent volumes are mounted onto [Kubernetes] nodes prevents this solution (manual deletion of an offending data file) from being viable because the PV mount on the node's filesystem is detached when the pods are down. This is a subtlety of running Cassandra in Kubernetes. On Thu, Oct 24, 2019 at 4:24 PM Reid Pinchback wrote: > Ben, you may find this helpful: > > > > https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/ > > > > > > *From: *Ben Mills > *Reply-To: *"user@cassandra.apache.org" > *Date: *Thursday, October 24, 2019 at 3:31 PM > *To: *"user@cassandra.apache.org" > *Subject: *Repair Issues > > > > *Message from External Sender* > > Greetings, > > Inherited a small Cassandra cluster with some repair issues and need some > advice on recommended next steps. Apologies in advance for a long email. > > Issue: > > Intermittent repair failures on two non-system keyspaces. > > - platform_users > - platform_management > > Repair Type: > > Full, parallel repairs are run on each of the three nodes every five days= . > > Repair command output for a typical failure: > > [2019-10-18 00:22:09,109] Starting repair command #46, repairing keyspace > platform_users with repair options (parallelism: parallel, primary range: > false, incremental: false, job threads: 1, ColumnFamilies: [], dataCenter= s: > [], hosts: [], # of ranges: 12) > [2019-10-18 00:22:09,242] Repair session > 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range > [(-1890954128429545684,2847510199483651721], > (8249813014782655320,-8746483007209345011], > (4299912178579297893,6811748355903297393], > (-8746483007209345011,-8628999431140554276], > (-5865769407232506956,-4746990901966533744], > (-4470950459111056725,-1890954128429545684], > (4001531392883953257,4299912178579297893], > (6811748355903297393,6878104809564599690], > (6878104809564599690,8249813014782655320], > (-4746990901966533744,-4470950459111056725], > (-8628999431140554276,-5865769407232506956], > (2847510199483651721,4001531392883953257]] failed with error [repair > #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2, > [(-1890954128429545684,2847510199483651721], > (8249813014782655320,-8746483007209345011], > (4299912178579297893,6811748355903297393], > (-8746483007209345011,-8628999431140554276], > (-5865769407232506956,-4746990901966533744], > (-4470950459111056725,-1890954128429545684], > (4001531392883953257,4299912178579297893], > (6811748355903297393,6878104809564599690], > (6878104809564599690,8249813014782655320], > (-4746990901966533744,-4470950459111056725], > (-8628999431140554276,-5865769407232506956], > (2847510199483651721,4001531392883953257]]] Validation failed in /10.x.x.= x > (progress: 26%) > [2019-10-18 00:22:09,246] Some repair failed > [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds > > Additional Notes: > > Repairs encounter above failures more often than not. Sometimes on one > node only, though occasionally on two. Sometimes just one of the two > keyspaces, sometimes both. Apparently the previous repair schedule for > this cluster included incremental repairs (script alternated between > incremental and full repairs). After reading this TLP article: > > > https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repa= ir.html > > > the repair script was replaced with cassandra-reaper (v1.4.0), which was > run with its default configs. Reaper was fine but only obscured the ongoi= ng > issues (it did not resolve them) and complicated the debugging process an= d > so was then removed. The current repair schedule is as described above > under Repair Type. > > Attempts at Resolution: > > (1) nodetool scrub was attempted on the offending keyspaces/tables to no > effect. > > (2) sstablescrub has not been attempted due to the current design of the > Docker image that runs Cassandra in each Kubernetes pod - i.e. there is n= o > way to stop the server to run this utility without killing the only pid > running in the container. > > Related Error: > > Not sure if this is related, though sometimes, when either: > > (a) Running nodetool snapshot, or > (b) Rolling a pod that runs a Cassandra node, which calls nodetool drain > prior shutdown, > > the following error is thrown: > > -- StackTrace -- > java.lang.RuntimeException: Last written key > DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda, > 10df3ba16eb24c8ebdddc0c7af586bda) >=3D current key > DecoratedKey(00000000-0000-0000-0000-000000000000, > 17343121887f480c9ba87c0e32206b74) writing into > /cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf= 11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(Bi= gTableWriter.java:114) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTable= Writer.java:153) > at > org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSST= ableMultiWriter.java:48) > at > org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtab= le.java:441) > at > org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477) > at > org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:624) > at java.lang.Thread.run(Thread.java:748) > > Here are some details on the environment and configs in the event that > something is relevant. > > Environment: Kubernetes > Environment Config: Stateful set of 3 replicas > Storage: Persistent Volumes > Storage Class: SSD > Node OS: Container-Optimized OS > Container OS: Ubuntu 16.04.3 LTS > > Version: Cassandra 3.7 > Data Centers: 1 > Racks: 3 (one per zone) > Nodes: 3 > Tokens: 4 > Replication Factor: 3 > Replication Strategy: NetworkTopologyStrategy (all keyspaces) > Compaction Strategy: STCS (all tables) > Read/Write Requirements: Blend of both > Data Load: <1GB per node > gc_grace_seconds: default (10 days - all tables) > > Memory: 4Gi per node > CPU: 3.5 per node (3500m) > > Java Version: 1.8.0_144 > > Heap Settings: > > -XX:+UnlockExperimentalVMOptions > -XX:+UseCGroupMemoryLimitForHeap > -XX:MaxRAMFraction=3D2 > > GC Settings: (CMS) > > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=3D8 > -XX:MaxTenuringThreshold=3D1 > -XX:CMSInitiatingOccupancyFraction=3D75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSWaitDuration=3D30000 > -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSEdenChunksRecordAlways > > > > Any ideas are much appreciated. > --000000000000b6c74b0595ae7946 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi Reid= ,

Many thanks - I= have seen that article though will definitely give it another read.=

Note that nodetool scrub has been tr= ied (no effect) and sstablescrub=C2=A0cannot currently be run with the Cass= andra image in use (though certainly a new image that allows the server to = be stopped but keeps the operating environment=C2=A0available to use this u= tility can be built - just haven't done so yet). Note also that none of= the logs are indicating that a corrupt data=C2=A0file (or files) is in pla= y here. Noting that because the article includes a solution where a specifi= c data file is manually deleted and then repairs are run to restore the fil= e from a different node in the cluster. Also, the way persistent volumes ar= e mounted onto [Kubernetes] nodes prevents this solution (manual deletion o= f an offending data file) from being viable because=C2=A0the PV mount on th= e node's filesystem is detached when the pods are down. This is a subtl= ety of running Cassandra in Kubernetes.

On Thu, Oct 24, 2019= at 4:24 PM Reid Pinchback <rpinchback@tripadvisor.com> wrote:

Ben, you may find this helpful:

=C2=A0

https://blog.pythian.com/so= -you-have-a-broken-cassandra-sstable-file/

=C2=A0

=C2=A0

From: = Ben Mills <ben@bitbrew.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Thursday, October 24, 2019 at 3:31 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Repair Issues

=C2=A0

Message from External Sender

Greetings,

Inherited a small Cassandra cluster with some repair issues and need some a= dvice on recommended next steps. Apologies in advance for a long email.

Issue:

Intermittent repair failures on two non-system keyspaces.

- platform_users
- platform_management

Repair Type:

Full, parallel repairs are run on each of the three nodes every five days.<= br>
Repair command output for a typical failure:

[2019-10-18 00:22:09,109] Starting repair command #46, repairing keyspace p= latform_users with repair options (parallelism: parallel, primary range: fa= lse, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters: [= ], hosts: [], # of ranges: 12)
[2019-10-18 00:22:09,242] Repair session 5282be70-f13d-11e9-9b4e-7f6db768ba= 9a for range [(-1890954128429545684,2847510199483651721], (8249813014782655= 320,-8746483007209345011], (4299912178579297893,6811748355903297393], (-874= 6483007209345011,-8628999431140554276], (-5865769407232506956,-4746990901966533744], (-4470950459111056725,-189095= 4128429545684], (4001531392883953257,4299912178579297893], (681174835590329= 7393,6878104809564599690], (6878104809564599690,8249813014782655320], (-474= 6990901966533744,-4470950459111056725], (-8628999431140554276,-5865769407232506956], (2847510199483651721,40015313= 92883953257]] failed with error [repair #5282be70-f13d-11e9-9b4e-7f6db768ba= 9a on platform_users/access_tokens_v2, [(-1890954128429545684,2847510199483= 651721], (8249813014782655320,-8746483007209345011], (4299912178579297893,6811748355903297393], (-8746483007209345011,-86289994= 31140554276], (-5865769407232506956,-4746990901966533744], (-44709504591110= 56725,-1890954128429545684], (4001531392883953257,4299912178579297893], (68= 11748355903297393,6878104809564599690], (6878104809564599690,8249813014782655320], (-4746990901966533744,-44709504= 59111056725], (-8628999431140554276,-5865769407232506956], (284751019948365= 1721,4001531392883953257]]] Validation failed in /10.x.x.x (progress: 26%)<= br> [2019-10-18 00:22:09,246] Some repair failed
[2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds

Additional Notes:

Repairs encounter above failures more often than not. Sometimes on one node= only, though occasionally on two. Sometimes just one of the two keyspaces,= sometimes both. Apparently the previous repair schedule for this cluster included incremental r= epairs (script alternated between incremental and full repairs). After read= ing this TLP article:

https://thelastpickle.com/blog/2017/12/14/should-you= -use-incremental-repair.html

the repair script was replaced with cassandra-reaper (v1.4.0), which was ru= n with its default configs. Reaper was fine but only obscured the ongoing i= ssues (it did not resolve them) and complicated the debugging process and s= o was then removed. The current repair schedule is as described above under Repair Type.

Attempts at Resolution:

(1) nodetool scrub was attempted on the offending keyspaces/tables to no ef= fect.

(2) sstablescrub has not been attempted due to the current design of the Do= cker image that runs Cassandra in each Kubernetes pod - i.e. there is no wa= y to stop the server to run this utility without killing the only pid runni= ng in the container.

Related Error:

Not sure if this is related, though sometimes, when either:

(a) Running nodeto= ol snapshot, or
(b) Rolling=C2=A0a= pod that runs a Cassandra node, which calls nodetool drain prior sh= utdown,

the following error is thrown:

-- StackTrace --
java.lang.RuntimeException: Last written key DecoratedKey(10df3ba1-6eb2-4c8= e-bddd-c0c7af586bda, 10df3ba16eb24c8ebdddc0c7af586bda) >=3D current key = DecoratedKey(00000000-0000-0000-0000-000000000000, 17343121887f480c9ba87c0e= 32206b74) writing into /cassandra_data/data/platform_management/device_by_t= enant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-= big-Data.db
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.io.sstabl= e.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.io.sstabl= e.format.big.BigTableWriter.append(BigTableWriter.java:153)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.io.sstabl= e.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.db.Memtab= le$FlushRunnable.writeSortedContents(Memtable.java:441)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.db.Memtab= le$FlushRunnable.call(Memtable.java:477)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.db.Memtab= le$FlushRunnable.call(Memtable.java:363)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.FutureTas= k.run(FutureTask.java:266)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoo= lExecutor.runWorker(ThreadPoolExecutor.java:1149)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoo= lExecutor$Worker.run(ThreadPoolExecutor.java:624)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.ja= va:748)

Here are some details on the environment and configs in the event that some= thing is relevant.

Environment: Kubernetes
Environment Config: Stateful set of 3 replicas
Storage: Persistent Volumes
Storage Class: SSD
Node OS: Container-Optimized OS
Container OS: Ubuntu 16.04.3 LTS

Version: Cassandra 3.7
Data Centers: 1
Racks: 3 (one per zone)
Nodes: 3
Tokens: 4
Replication Factor: 3
Replication Strategy: NetworkTopologyStrategy (all keyspaces)
Compaction Strategy: STCS (all tables)
Read/Write Requirements: Blend of both
Data Load: <1GB per node
gc_grace_seconds: default (10 days - all tables)

Memory: 4Gi per node
CPU: 3.5 per node (3500m)

Java Version: 1.8.0_144

Heap Settings:

-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=3D2

GC Settings: (CMS)

-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=3D8
-XX:MaxTenuringThreshold=3D1
-XX:CMSInitiatingOccupancyFraction=3D75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=3D30000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways

=C2=A0

Any ideas are much appreciated.=C2=A0

--000000000000b6c74b0595ae7946--