Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1EA6B200CC9 for ; Mon, 17 Jul 2017 22:49:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1D074165B8A; Mon, 17 Jul 2017 20:49:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 639CE165B88 for ; Mon, 17 Jul 2017 22:49:04 +0200 (CEST) Received: (qmail 85821 invoked by uid 500); 17 Jul 2017 20:49:03 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 85810 invoked by uid 99); 17 Jul 2017 20:49:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jul 2017 20:49:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0E3E8C039B for ; Mon, 17 Jul 2017 20:49:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id rs8hIrD3fea2 for ; Mon, 17 Jul 2017 20:49:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 244995F6C2 for ; Mon, 17 Jul 2017 20:49:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 64D66E0069 for ; Mon, 17 Jul 2017 20:49:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 231CF2475D for ; Mon, 17 Jul 2017 20:49:00 +0000 (UTC) Date: Mon, 17 Jul 2017 20:49:00 +0000 (UTC) From: "Jeff Jirsa (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13696) Digest mismatch Exception if hints file has UnknownColumnFamily MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 17 Jul 2017 20:49:05 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090530#comment-16090530 ] Jeff Jirsa commented on CASSANDRA-13696: ---------------------------------------- Looks pretty serious, I'll try to get to it soon. Short term, your trunk patch doesn't actually compile ( {{CFMetaData}} is gone in trunk), can you fix that and push that branch? > Digest mismatch Exception if hints file has UnknownColumnFamily > --------------------------------------------------------------- > > Key: CASSANDRA-13696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13696 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jay Zhuang > Assignee: Jay Zhuang > Priority: Critical > > {noformat} > WARN [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints > ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted ({}) > org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) ~[main/:na] > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) ~[main/:na] > at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [main/:na] > at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [main/:na] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [main/:na] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] > Caused by: java.io.IOException: Digest mismatch exception > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) ~[main/:na] > at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) ~[main/:na] > ... 16 common frames omitted > {noformat} > It causes multiple cassandra nodes stop [by default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]. > Here is the reproduce steps on a 3 nodes cluster, RF=3: > 1. stop node1 > 2. send some data with quorum (or one), it will generate hints file on node2/node3 > 3. drop the table > 4. start node1 > node2/node3 will report "corrupted hints file" and stop. The impact is very bad for a large cluster, when it happens, almost all the nodes are down at the same time and we have to remove all the hints files (which contain the dropped table) to bring the node back. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org