cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13696) Digest mismatch Exception if hints file has UnknownColumnFamily
Date Thu, 20 Jul 2017 15:44:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094602#comment-16094602
] 

Alex Petrov edited comment on CASSANDRA-13696 at 7/20/17 3:43 PM:
------------------------------------------------------------------

I agree we should also return a correct version from the hints service (as [~jay.zhuang] already
mentioned), like [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13696-3.0]
same as we do in commit log descriptor.

This also would make the issue for same-version go away, and since it would make the service
to pick a different code path I'd say it's also necessary to include it. 

WRT to the patch itself, might be it's better to just call {{resetCrc}} explicitly and still
return null like I did [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13696-3.0#diff-cf15f9cac67d8b2f3e581129d617df16R242]?
{{hint}} is a local variable, and setting it and carrying on makes the logic a bit harder
to understand. For example, for me it was non-obvious that this boolean method would also
do some buffer rewinding / state resetting under the hood. 


was (Author: ifesdjeen):
I think we should also return a correct version from the hints service [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13696-3.0]
same as we do in commit log descriptor.

This also would make the issue for same-version go away, and since it would make the service
to pick a different code path I'd say it's also necessary to include it. 

WRT to the patch itself, might be it's better to just call {{resetCrc}} explicitly and still
return null like I did [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13696-3.0#diff-cf15f9cac67d8b2f3e581129d617df16R242]?
{{hint}} is a local variable, and setting it and carrying on makes the logic a bit harder
to understand. For example, for me it was non-obvious that this boolean method would also
do some buffer rewinding / state resetting under the hood. 

> Digest mismatch Exception if hints file has UnknownColumnFamily
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13696
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jay Zhuang
>            Assignee: Jay Zhuang
>            Priority: Blocker
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> {noformat}
> WARN  [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read
a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964
is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints
> ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed
to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is
corrupted ({})
> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
>     at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199)
~[main/:na]
>     at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164)
~[main/:na]
>     at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[main/:na]
>     at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157)
~[main/:na]
>     at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139)
~[main/:na]
>     at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123)
~[main/:na]
>     at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na]
>     at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268)
[main/:na]
>     at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251)
[main/:na]
>     at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229)
[main/:na]
>     at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208)
[main/:na]
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_111]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_111]
>     at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
[main/:na]
>     at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
> Caused by: java.io.IOException: Digest mismatch exception
>     at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216)
~[main/:na]
>     at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190)
~[main/:na]
>     ... 16 common frames omitted
> {noformat}
> It causes multiple cassandra nodes stop [by default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188].
> Here is the reproduce steps on a 3 nodes cluster, RF=3:
> 1. stop node1
> 2. send some data with quorum (or one), it will generate hints file on node2/node3
> 3. drop the table
> 4. start node1
> node2/node3 will report "corrupted hints file" and stop. The impact is very bad for a
large cluster, when it happens, almost all the nodes are down at the same time and we have
to remove all the hints files (which contain the dropped table) to bring the node back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message