cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Mills <...@bitbrew.com>
Subject Re: Repair Issues
Date Thu, 24 Oct 2019 21:25:42 GMT
Thanks Jon!

This is very helpful - allow me to follow-up and ask a question.

(1) Yes, incremental repairs will never be used (unless it becomes viable
in Cassandra 4.x someday).
(2) I hear you on the JVM - will look into that.
(3) Been looking at Cassandra version 3.11.x though was unaware that 3.7 is
considered non-viable for production use.

For (4) - Question/Request:

Note that with:

-XX:MaxRAMFraction=2

the actual amount of memory allocated for heap space is effectively 2Gi
(i.e. half of the 4Gi allocated on the machine type). We can definitely
increase memory (for heap and nonheap), though can you expand a bit on your
heap comment to help my understanding (as this is such a small cluster with
such a small amount of data at rest)?

Thanks again.

On Thu, Oct 24, 2019 at 5:11 PM Jon Haddad <jon@jonhaddad.com> wrote:

> There's some major warning signs for me with your environment.  4GB heap
> is too low, and Cassandra 3.7 isn't something I would put into production.
>
> Your surface area for problems is massive right now.  Things I'd do:
>
> 1. Never use incremental repair.  Seems like you've already stopped doing
> them, but it's worth mentioning.
> 2. Upgrade to the latest JVM, that version's way out of date.
> 3. Upgrade to Cassandra 3.11.latest (we're voting on 3.11.5 right now).
> 4. Increase memory to 8GB minimum, preferably 12.
>
> I usually don't like making a bunch of changes without knowing the root
> cause of a problem, but in your case there's so many potential problems I
> don't think it's worth digging into, especially since the problem might be
> one of the 500 or so bugs that were fixed since this release.
>
> Once you've done those things it'll be easier to narrow down the problem.
>
> Jon
>
>
> On Thu, Oct 24, 2019 at 4:59 PM Ben Mills <ben@bitbrew.com> wrote:
>
>> Hi Sergio,
>>
>> No, not at this time.
>>
>> It was in use with this cluster previously, and while there were no
>> reaper-specific issues, it was removed to help simplify investigation of
>> the underlying repair issues I've described.
>>
>> Thanks.
>>
>> On Thu, Oct 24, 2019 at 4:21 PM Sergio <lapostadisergio@gmail.com> wrote:
>>
>>> Are you using Cassandra reaper?
>>>
>>> On Thu, Oct 24, 2019, 12:31 PM Ben Mills <ben@bitbrew.com> wrote:
>>>
>>>> Greetings,
>>>>
>>>> Inherited a small Cassandra cluster with some repair issues and need
>>>> some advice on recommended next steps. Apologies in advance for a long
>>>> email.
>>>>
>>>> Issue:
>>>>
>>>> Intermittent repair failures on two non-system keyspaces.
>>>>
>>>> - platform_users
>>>> - platform_management
>>>>
>>>> Repair Type:
>>>>
>>>> Full, parallel repairs are run on each of the three nodes every five
>>>> days.
>>>>
>>>> Repair command output for a typical failure:
>>>>
>>>> [2019-10-18 00:22:09,109] Starting repair command #46, repairing
>>>> keyspace platform_users with repair options (parallelism: parallel, primary
>>>> range: false, incremental: false, job threads: 1, ColumnFamilies: [],
>>>> dataCenters: [], hosts: [], # of ranges: 12)
>>>> [2019-10-18 00:22:09,242] Repair session
>>>> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range
>>>> [(-1890954128429545684,2847510199483651721],
>>>> (8249813014782655320,-8746483007209345011],
>>>> (4299912178579297893,6811748355903297393],
>>>> (-8746483007209345011,-8628999431140554276],
>>>> (-5865769407232506956,-4746990901966533744],
>>>> (-4470950459111056725,-1890954128429545684],
>>>> (4001531392883953257,4299912178579297893],
>>>> (6811748355903297393,6878104809564599690],
>>>> (6878104809564599690,8249813014782655320],
>>>> (-4746990901966533744,-4470950459111056725],
>>>> (-8628999431140554276,-5865769407232506956],
>>>> (2847510199483651721,4001531392883953257]] failed with error [repair
>>>> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2,
>>>> [(-1890954128429545684,2847510199483651721],
>>>> (8249813014782655320,-8746483007209345011],
>>>> (4299912178579297893,6811748355903297393],
>>>> (-8746483007209345011,-8628999431140554276],
>>>> (-5865769407232506956,-4746990901966533744],
>>>> (-4470950459111056725,-1890954128429545684],
>>>> (4001531392883953257,4299912178579297893],
>>>> (6811748355903297393,6878104809564599690],
>>>> (6878104809564599690,8249813014782655320],
>>>> (-4746990901966533744,-4470950459111056725],
>>>> (-8628999431140554276,-5865769407232506956],
>>>> (2847510199483651721,4001531392883953257]]] Validation failed in /10.x.x.x
>>>> (progress: 26%)
>>>> [2019-10-18 00:22:09,246] Some repair failed
>>>> [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds
>>>>
>>>> Additional Notes:
>>>>
>>>> Repairs encounter above failures more often than not. Sometimes on one
>>>> node only, though occasionally on two. Sometimes just one of the two
>>>> keyspaces, sometimes both. Apparently the previous repair schedule for
>>>> this cluster included incremental repairs (script alternated between
>>>> incremental and full repairs). After reading this TLP article:
>>>>
>>>>
>>>> https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
>>>>
>>>> the repair script was replaced with cassandra-reaper (v1.4.0), which
>>>> was run with its default configs. Reaper was fine but only obscured the
>>>> ongoing issues (it did not resolve them) and complicated the debugging
>>>> process and so was then removed. The current repair schedule is as
>>>> described above under Repair Type.
>>>>
>>>> Attempts at Resolution:
>>>>
>>>> (1) nodetool scrub was attempted on the offending keyspaces/tables to
>>>> no effect.
>>>>
>>>> (2) sstablescrub has not been attempted due to the current design of
>>>> the Docker image that runs Cassandra in each Kubernetes pod - i.e. there
is
>>>> no way to stop the server to run this utility without killing the only pid
>>>> running in the container.
>>>>
>>>> Related Error:
>>>>
>>>> Not sure if this is related, though sometimes, when either:
>>>>
>>>> (a) Running nodetool snapshot, or
>>>> (b) Rolling a pod that runs a Cassandra node, which calls nodetool
>>>> drain prior shutdown,
>>>>
>>>> the following error is thrown:
>>>>
>>>> -- StackTrace --
>>>> java.lang.RuntimeException: Last written key
>>>> DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda,
>>>> 10df3ba16eb24c8ebdddc0c7af586bda) >= current key
>>>> DecoratedKey(00000000-0000-0000-0000-000000000000,
>>>> 17343121887f480c9ba87c0e32206b74) writing into
>>>> /cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db
>>>>             at
>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114)
>>>>             at
>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:153)
>>>>             at
>>>> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>>>>             at
>>>> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:441)
>>>>             at
>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477)
>>>>             at
>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363)
>>>>             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>             at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>             at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>             at java.lang.Thread.run(Thread.java:748)
>>>>
>>>> Here are some details on the environment and configs in the event that
>>>> something is relevant.
>>>>
>>>> Environment: Kubernetes
>>>> Environment Config: Stateful set of 3 replicas
>>>> Storage: Persistent Volumes
>>>> Storage Class: SSD
>>>> Node OS: Container-Optimized OS
>>>> Container OS: Ubuntu 16.04.3 LTS
>>>>
>>>> Version: Cassandra 3.7
>>>> Data Centers: 1
>>>> Racks: 3 (one per zone)
>>>> Nodes: 3
>>>> Tokens: 4
>>>> Replication Factor: 3
>>>> Replication Strategy: NetworkTopologyStrategy (all keyspaces)
>>>> Compaction Strategy: STCS (all tables)
>>>> Read/Write Requirements: Blend of both
>>>> Data Load: <1GB per node
>>>> gc_grace_seconds: default (10 days - all tables)
>>>>
>>>> Memory: 4Gi per node
>>>> CPU: 3.5 per node (3500m)
>>>>
>>>> Java Version: 1.8.0_144
>>>>
>>>> Heap Settings:
>>>>
>>>> -XX:+UnlockExperimentalVMOptions
>>>> -XX:+UseCGroupMemoryLimitForHeap
>>>> -XX:MaxRAMFraction=2
>>>>
>>>> GC Settings: (CMS)
>>>>
>>>> -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSParallelRemarkEnabled
>>>> -XX:SurvivorRatio=8
>>>> -XX:MaxTenuringThreshold=1
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>> -XX:CMSWaitDuration=30000
>>>> -XX:+CMSParallelInitialMarkEnabled
>>>> -XX:+CMSEdenChunksRecordAlways
>>>>
>>>> Any ideas are much appreciated.
>>>>
>>>

Mime
View raw message