cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Ortolani <ostef...@gmail.com>
Subject Data corruption, invalid UTF-8 bytes
Date Tue, 02 Jan 2018 17:44:48 GMT
Hi all,

apparently the year started with a node (version 3.0.15) exhibiting some
data corruption (discovered by a spark job enumerating all keys).

The exception is attached below.

The invalid string is a partition key, and it is supposed to be a file
name. If I manually decode the bytes I get something that resembles a path
but with lots of garbage inside.

Now part of the garbage might be intentional, which means I am still
thinking whether this is an actual data corruption or the input string was
already corrupted. In the latter case, the string should not have inserted
though, the driver or Cassandra should have rejected the WRITE, am I
correct?

I admit I am a bit stuck, because obviously getsstables doesn't work.
Does anybody have any suggestion how to deal with this situation?

Ideally I would like to identify the sstable, but also check that the
corruption did not replicate it to other nodes, and possibly delete the
offending partition.

A starting point would be to generate the token associated with the
offending key. Since I am using the murmur partitioner I was planning to
just compute the token value of that byte sequence. Would this be a sound
approach?

Any suggestion is more than welcome :S

Cheers,
Stefano

WARN  [SharedPool-Worker-18] 2018-01-02 16:49:43,861
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
Thread[SharedPool-Worker-18,5,main]: {}
org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes
433a5c484a5c5858505c17443535575c5d425a5c515f5c202031332031335c2e4d5c2c364256563b5c203230dbb74c5c2754345c532031452121445c7f584d7f7f555c4e485757455c56203330585c465c203144334f5f5f345c29605c38495c2033415d595c4c335c203134365c364e4a5c2c46e9bdbd485c5a39397e5c203231edb9b9235cc6a7592d432d435d5c354e595c45495c1738525c442032324747485c55203230265c43355c4b353b565c4429dbb7495c23525c2031442031443b5c45552e141457
at
org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:45)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:28)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:130)
~[apache-cassandra-3.0.15.jar:3.0.15]
at org.apache.cassandra.dht.AbstractBounds.format(AbstractBounds.java:130)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.dht.AbstractBounds.getString(AbstractBounds.java:123)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.PartitionRangeReadCommand.queryStorage(PartitionRangeReadCommand.java:245)
~[apache-cassandra-3.0.15.jar:3.0.15]
at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:405)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:45)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-3.0.15.jar:3.0.15]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_131]
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.0.15.jar:3.0.15]
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.0.15.jar:3.0.15]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
[apache-cassandra-3.0.15.jar:3.0.15]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

Mime
View raw message