From jira-return-10910-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Mon Mar 19 10:34:07 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 06BDB180647 for ; Mon, 19 Mar 2018 10:34:06 +0100 (CET) Received: (qmail 43488 invoked by uid 500); 19 Mar 2018 09:34:05 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 43476 invoked by uid 99); 19 Mar 2018 09:34:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2018 09:34:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5C7D9C1BEC for ; Mon, 19 Mar 2018 09:34:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id DUcXxcGY5fEy for ; Mon, 19 Mar 2018 09:34:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5A4185F6BE for ; Mon, 19 Mar 2018 09:34:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6C125E0252 for ; Mon, 19 Mar 2018 09:34:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1E91B214A6 for ; Mon, 19 Mar 2018 09:34:00 +0000 (UTC) Date: Mon, 19 Mar 2018 09:34:00 +0000 (UTC) From: "Chetan Pandey (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-6582) Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers do not take over and we need to manually restart the broker. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-6582?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1640= 4550#comment-16404550 ]=20 Chetan Pandey commented on KAFKA-6582: -------------------------------------- I am facing the same issue while upgrading our cluster from 0.8.2.1 to 1.0 = .=20 After starting broker it starts giving this exception=20 java.io.IOException: Connection to 1 was disconnected before the response w= as read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(Netwo= rkClientUtils.java:95) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetch= erBlockingSend.scala:96) at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.sca= la:205) at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.sca= la:41) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractF= etcherThread.scala:149) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.= scala:113) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) > Partitions get underreplicated, with a single ISR, and doesn't recover. O= ther brokers do not take over and we need to manually restart the broker. > -------------------------------------------------------------------------= ------------------------------------------------------------------------- > > Key: KAFKA-6582 > URL: https://issues.apache.org/jira/browse/KAFKA-6582 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 1.0.0 > Environment: Ubuntu 16.04 > Linux kafka04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 20= 18 x86_64 x86_64 x86_64 GNU/Linux > java version "9.0.1" > Java(TM) SE Runtime Environment (build 9.0.1+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)=20 > but also tried with the latest JVM 8 before with the same result. > Reporter: Jurriaan Pruis > Priority: Major > > Partitions get underreplicated, with a single ISR, and doesn't recover. O= ther brokers do not take over and we need to manually restart the 'single I= SR' broker (if you describe the partitions of replicated topic it is clear = that some partitions are only in sync on this broker). > This bug resembles KAFKA-4477 a lot, but since that issue is marked as re= solved this is probably something else but similar. > We have the same issue (or at least it looks pretty similar) on Kafka 1.0= .=C2=A0 > Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we'= ve upgraded from Kafka 0.10.2.1). > This happens almost every 24-48 hours on a random broker. This is why we = currently have a cronjob which restarts every broker every 24 hours.=C2=A0 > During this issue the ISR shows the following server log:=C2=A0 > {code:java} > [2018-02-20 12:02:08,342] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 48.20:56352-96708 (kafka.network.Processor) > [2018-02-20 12:02:08,364] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 50.25:54412-96715 (kafka.network.Processor) > [2018-02-20 12:02:08,349] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 49.18:35182-96705 (kafka.network.Processor) > [2018-02-20 12:02:08,379] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 50.25:54456-96717 (kafka.network.Processor) > [2018-02-20 12:02:08,448] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 59.20:36388-96720 (kafka.network.Processor) > [2018-02-20 12:02:08,683] WARN Attempting to send response via channel fo= r which there is no open connection, connection id 10.132.0.32:9092-10.14.1= 57.110:41922-96740 (kafka.network.Processor) > {code} > Also on the ISR broker, the controller log shows this: > {code:java} > [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]: Co= ntroller 3 connected to 10.132.0.32:9092 (id: 3 rack: null) for sending sta= te change requests (kafka.controller.RequestSendThread) > [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]: Co= ntroller 3 connected to 10.132.0.10:9092 (id: 0 rack: null) for sending sta= te change requests (kafka.controller.RequestSendThread) > [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]: Co= ntroller 3 connected to 10.132.0.12:9092 (id: 1 rack: null) for sending sta= te change requests (kafka.controller.RequestSendThread){code} > And the non-ISR=C2=A0brokers show these kind of errors: > =C2=A0 > {code:java} > 2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=3D1, leaderId=3D3= , fetcherId=3D0] Error in fetch to broker 3, request (type=3DFetchRequest, = replicaId=3D1, maxWait=3D500, minBytes=3D1, maxBytes=3D10485760, fetchData= =3D{......................}, isolationLevel=3DREAD_UNCOMMITTED) (kafka.serv= er.ReplicaFetcherThread) > java.io.IOException: Connection to 3 was disconnected before the response= was read > at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkCli= entUtils.java:95) > at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlo= ckingSend.scala:96) > at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:20= 5) > at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41= ) > at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetche= rThread.scala:149) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala= :113) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) > {code} > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)