cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Ortolani <ostef...@gmail.com>
Subject Re: Data corruption, invalid UTF-8 bytes
Date Wed, 03 Jan 2018 12:12:57 GMT
Little update.

I've managed to compute the token, and I can indeed SELECT the row from
CQLSH.
Interestingly enough, if I use CQLSH I do not get the exception (even if
the string is printed out).

I am now wondering whether, instead of a data corruption, the error is
related to the reading path used by the java driver, but I fail to see how
that could be different when using CQLSH (python).
Does anybody more familiar with the reading path able to shed some light on
the stack trace?

Thanks,
Stefano

On Tue, Jan 2, 2018 at 6:44 PM, Stefano Ortolani <ostefano@gmail.com> wrote:

> Hi all,
>
> apparently the year started with a node (version 3.0.15) exhibiting some
> data corruption (discovered by a spark job enumerating all keys).
>
> The exception is attached below.
>
> The invalid string is a partition key, and it is supposed to be a file
> name. If I manually decode the bytes I get something that resembles a path
> but with lots of garbage inside.
>
> Now part of the garbage might be intentional, which means I am still
> thinking whether this is an actual data corruption or the input string was
> already corrupted. In the latter case, the string should not have inserted
> though, the driver or Cassandra should have rejected the WRITE, am I
> correct?
>
> I admit I am a bit stuck, because obviously getsstables doesn't work.
> Does anybody have any suggestion how to deal with this situation?
>
> Ideally I would like to identify the sstable, but also check that the
> corruption did not replicate it to other nodes, and possibly delete the
> offending partition.
>
> A starting point would be to generate the token associated with the
> offending key. Since I am using the murmur partitioner I was planning to
> just compute the token value of that byte sequence. Would this be a sound
> approach?
>
> Any suggestion is more than welcome :S
>
> Cheers,
> Stefano
>
> WARN  [SharedPool-Worker-18] 2018-01-02 16:49:43,861
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[SharedPool-Worker-18,5,main]: {}
> org.apache.cassandra.serializers.MarshalException: Invalid UTF-8 bytes
> 433a5c484a5c5858505c17443535575c5d425a5c515f5c20203133203133
> 5c2e4d5c2c364256563b5c203230dbb74c5c2754345c532031452121445c
> 7f584d7f7f555c4e485757455c56203330585c465c203144334f5f5f345c
> 29605c38495c2033415d595c4c335c203134365c364e4a5c2c46e9bdbd48
> 5c5a39397e5c203231edb9b9235cc6a7592d432d435d5c354e595c45495c
> 1738525c442032324747485c55203230265c43355c4b353b565c4429dbb7
> 495c23525c2031442031443b5c45552e141457
> at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(
> AbstractTextSerializer.java:45) ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(
> AbstractTextSerializer.java:28) ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.db.marshal.AbstractType.
> getString(AbstractType.java:130) ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.dht.AbstractBounds.format(AbstractBounds.java:130)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.dht.AbstractBounds.getString(AbstractBounds.java:123)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.db.PartitionRangeReadCommand.queryStorage(
> PartitionRangeReadCommand.java:245) ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:405)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(
> ReadCommandVerbHandler.java:45) ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_131]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [apache-cassandra-3.0.15.jar:3.0.15]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.0.15.jar:3.0.15]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
>
>
>

Mime
View raw message