cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer, Thomas" <>
Subject Data Corruption due to multiple Cassandra 2.1 processes?
Date Tue, 07 Aug 2018 05:57:43 GMT

with 2.1, in case a second Cassandra process/instance is started on a host (by accident),
may this result in some sort of corruption, although Cassandra will exit at some point in
time due to not being able to bind TCP ports already in use?

What we have seen in this scenario is something like that:

ERROR [main] 2018-08-05 21:10:24,046 - Error starting local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
       Address already in use (Bind failed)

But then continuing with stuff like opening system and even user tables:

INFO  [main] 2018-08-05 21:10:24,060 - Initializing key cache with capacity
of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 - Initializing row cache with capacity
of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 - Initializing counter cache with
capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 - Scheduling counter cache save
to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 - Initializing system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165
(2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167
(2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166
(2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 - Initializing system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 - Opening /var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377
(46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 - Initializing system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 - Opening /var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129
(91269 bytes)

Replaying commit logs:

INFO  [main] 2018-08-05 21:10:25,896 - Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 - Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
(CL version 4, messaging version 8)

Even writing memtables already (below just pasted system tables, but also user tables):

INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 - Writing Memtable-size_estimates@1941663179(2.655MiB
serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 - Writing Memtable-peer_events@1474667699(0.199KiB
serialized bytes, 4 ops, 0%/0% of on/off-heap limit)

Until it comes to a point where it can't bind ports like the storage port 7000:

ERROR [main] 2018-08-05 21:11:54,350 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by another process.
 Change listen_address:storage_port in cassandra.yaml to values that do not conflict with
other services

Until Cassandra stops:

INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 - Announcing

So, we have around 2 minutes where Cassandra is mangling with existing data, although it shouldn't.

Sounds like a potential candidate for data corruption, right? E.g. later on we then see things
like (still while being in progress to shutdown?):

WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181
- Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.RuntimeException: /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db
(No such file or directory)

I found this one here:

So, if this all leads to corruption, might this be a candidate for a backport for a 2.1 bugfix

Thanks a lot!


The contents of this e-mail are intended for the named addressee only. It contains information
that may be confidential. Unless you are the named addressee or an authorized designee, you
may not copy or use it, or disclose it to anyone else. If you received it in error please
notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN
91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria,
Freist?dterstra?e 313

View raw message