cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Roth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12728) Handling partially written hint files
Date Wed, 30 Nov 2016 21:37:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709858#comment-15709858
] 

Benjamin Roth commented on CASSANDRA-12728:
-------------------------------------------

+1

Let the operator decide if he prefers a crash or inconsistency. When not crashing it should
be logged as error, so you can check error logs and instead of having to recover from a crash,
you could start a repair if desired. The only recovery action one can take is to repair anyway.
The only question is how to fail and how to get notified.
If the node crashes and the operator recognizes too late, situation may become even worse
when hints expire.

> Handling partially written hint files
> -------------------------------------
>
>                 Key: CASSANDRA-12728
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12728
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sharvanath Pathak
>            Assignee: Aleksey Yeschenko
>              Labels: lhf
>         Attachments: CASSANDRA-12728.patch
>
>
> {noformat}
> ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397 HintsDispatchExecutor.java:225 - Failed
to dispatch hints file d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is
corrupted ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
>         at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259)
[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_77]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_77]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.EOFException: null
>         at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) ~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         ... 15 common frames omitted
> {noformat}
> We've found out that the hint file was truncated because there was a hard reboot around
the time of last write to the file. I think we basically need to handle partially written
hint files. Also, the CRC file does not exist in this case (probably because it crashed while
writing the hints file). May be ignoring and cleaning up such partially written hint files
can be a way to fix this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message