Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CE283200BA0 for ; Fri, 14 Oct 2016 23:28:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CCC6C160AD3; Fri, 14 Oct 2016 21:28:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1EFB8160ADD for ; Fri, 14 Oct 2016 23:28:26 +0200 (CEST) Received: (qmail 89773 invoked by uid 500); 14 Oct 2016 21:28:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 89599 invoked by uid 99); 14 Oct 2016 21:28:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2016 21:28:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B607F2C4C79 for ; Fri, 14 Oct 2016 21:28:20 +0000 (UTC) Date: Fri, 14 Oct 2016 21:28:20 +0000 (UTC) From: "Garvit Juniwal (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-12728) Handling partially written hint files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 14 Oct 2016 21:28:28 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Garvit Juniwal updated CASSANDRA-12728: --------------------------------------- Attachment: CASSANDRA-12728.patch If a hints file gets truncated due to a crash, the last hint might have been written incompletely. Presence of a corrupted hints file can put Cassandra in a crash loop trying o read a corrupted hint over and over. Ignore and discard the last incomplete hint instead of throwing a FSReadError exception. > Handling partially written hint files > ------------------------------------- > > Key: CASSANDRA-12728 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12728 > Project: Cassandra > Issue Type: Bug > Reporter: Sharvanath Pathak > Assignee: Aleksey Yeschenko > Attachments: CASSANDRA-12728.patch > > > {noformat} > ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397 HintsDispatchExecutor.java:225 - Failed to dispatch hints file d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is corrupted ({}) > org.apache.cassandra.io.FSReadError: java.io.EOFException > at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259) [apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242) [apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220) [apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199) [apache-cassandra-3.0.6.jar:3.0.6] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_77] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] > Caused by: java.io.EOFException: null > at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278) ~[apache-cassandra-3.0.6.jar:3.0.6] > ... 15 common frames omitted > {noformat} > We've found out that the hint file was truncated because there was a hard reboot around the time of last write to the file. I think we basically need to handle partially written hint files. Also, the CRC file does not exist in this case (probably because it crashed while writing the hints file). May be ignoring and cleaning up such partially written hint files can be a way to fix this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)