Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C3E9B200CD6 for ; Mon, 17 Jul 2017 08:22:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C251F1643B7; Mon, 17 Jul 2017 06:22:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BA9B21643B6 for ; Mon, 17 Jul 2017 08:22:07 +0200 (CEST) Received: (qmail 99903 invoked by uid 500); 17 Jul 2017 06:22:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 99788 invoked by uid 99); 17 Jul 2017 06:22:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jul 2017 06:22:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BC8AA180145 for ; Mon, 17 Jul 2017 06:22:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id OS5ECmGlyFR8 for ; Mon, 17 Jul 2017 06:22:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 41C1C60CD9 for ; Mon, 17 Jul 2017 06:22:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 65CDBE0B4A for ; Mon, 17 Jul 2017 06:22:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2566324760 for ; Mon, 17 Jul 2017 06:22:00 +0000 (UTC) Date: Mon, 17 Jul 2017 06:22:00 +0000 (UTC) From: "Jay Zhuang (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-13696) Digest mismatch Exception if hints file has UnknownColumnFamily MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 17 Jul 2017 06:22:08 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-13696: ----------------------------------- Description: {noformat} WARN [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted ({}) org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) ~[main/:na] at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) ~[main/:na] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [main/:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] Caused by: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) ~[main/:na] at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) ~[main/:na] ... 16 common frames omitted {noformat} It causes multiple cassandra nodes stop [by default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]. Here is the reproduce steps on a 3 nodes cluster, RF=3: 1. stop node1 2. send some data with quorum (or one), it will generate hints file on node2/node3 3. drop the table 4. start node1 node2/node3 will report "corrupted hints file" and stop. The impact is very bad for a large cluster, when it happens, almost all the nodes are down at the same time and we have to remove all the hints files (which contain the dropped table) to bring the node back. was: {noformat} WARN [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted ({}) org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) ~[main/:na] at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) ~[main/:na] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[main/:na] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [main/:na] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [main/:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] Caused by: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) ~[main/:na] at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) ~[main/:na] ... 16 common frames omitted {noformat} It causes multiple cassandra nodes stop [by default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]: Here is the reproduce steps on a 3 nodes cluster, RF=3: 1. stop node1 2. send some data with quorum (or one) 3. drop the table 4. start node1 node2 and node3 will report "corrupted hints file" and stop. The impact is very bad for a large cluster, when it happens, almost all the nodes are down at the same time. > Digest mismatch Exception if hints file has UnknownColumnFamily > --------------------------------------------------------------- > > Key: CASSANDRA-13696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13696 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jay Zhuang > Assignee: Jay Zhuang > Priority: Critical > > {noformat} > WARN [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints > ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted ({}) > org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) ~[main/:na] > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) ~[main/:na] > at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [main/:na] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [main/:na] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] > Caused by: java.io.IOException: Digest mismatch exception > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) ~[main/:na] > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) ~[main/:na] > ... 16 common frames omitted > {noformat} > It causes multiple cassandra nodes stop [by default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]. > Here is the reproduce steps on a 3 nodes cluster, RF=3: > 1. stop node1 > 2. send some data with quorum (or one), it will generate hints file on node2/node3 > 3. drop the table > 4. start node1 > node2/node3 will report "corrupted hints file" and stop. The impact is very bad for a large cluster, when it happens, almost all the nodes are down at the same time and we have to remove all the hints files (which contain the dropped table) to bring the node back. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org