From dev-return-37996-apmail-kafka-dev-archive=kafka.apache.org@kafka.apache.org Mon Aug 3 16:30:05 2015 Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C16A918991 for ; Mon, 3 Aug 2015 16:30:05 +0000 (UTC) Received: (qmail 38275 invoked by uid 500); 3 Aug 2015 16:30:05 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 38184 invoked by uid 500); 3 Aug 2015 16:30:05 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 38170 invoked by uid 99); 3 Aug 2015 16:30:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2015 16:30:05 +0000 Date: Mon, 3 Aug 2015 16:30:05 +0000 (UTC) From: "Mayuresh Gharat (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652067#comment-14652067 ] Mayuresh Gharat commented on KAFKA-1860: ---------------------------------------- [~guozhang] ping. > File system errors are not detected unless Kafka tries to write > --------------------------------------------------------------- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Mayuresh Gharat > Fix For: 0.9.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the filesystem gets mounted into read-only mode, and hence when Kafka tries to read the disk, they'll get a FileNotFoundException with the read-only errno set (EROFS). > However, as long as there is no produce request received, hence no writes attempted on the disks, Kafka will not exit on such FATAL error (when the disk starts working again, Kafka might think some files are gone while they will reappear later as raid comes back online). Instead it keeps spilling exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] [kafka-server] [] Uncaught exception in scheduled task 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)