kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6582) Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers do not take over and we need to manually restart the broker.
Date Mon, 19 Mar 2018 09:34:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404550#comment-16404550
] 

Chetan Pandey commented on KAFKA-6582:
--------------------------------------

I am facing the same issue while upgrading our cluster from 0.8.2.1 to 1.0 . 
After starting broker it starts giving this exception 

java.io.IOException: Connection to 1 was disconnected before the response was read
        at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95)
        at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
        at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205)
        at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41)
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)



> Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers
do not take over and we need to manually restart the broker.
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6582
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6582
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 1.0.0
>         Environment: Ubuntu 16.04
> Linux kafka04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64
x86_64 GNU/Linux
> java version "9.0.1"
> Java(TM) SE Runtime Environment (build 9.0.1+11)
> Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) 
> but also tried with the latest JVM 8 before with the same result.
>            Reporter: Jurriaan Pruis
>            Priority: Major
>
> Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers
do not take over and we need to manually restart the 'single ISR' broker (if you describe
the partitions of replicated topic it is clear that some partitions are only in sync on this
broker).
> This bug resembles KAFKA-4477 a lot, but since that issue is marked as resolved this
is probably something else but similar.
> We have the same issue (or at least it looks pretty similar) on Kafka 1.0. 
> Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we've upgraded
from Kafka 0.10.2.1).
> This happens almost every 24-48 hours on a random broker. This is why we currently have
a cronjob which restarts every broker every 24 hours. 
> During this issue the ISR shows the following server log: 
> {code:java}
> [2018-02-20 12:02:08,342] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.148.20:56352-96708 (kafka.network.Processor)
> [2018-02-20 12:02:08,364] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.150.25:54412-96715 (kafka.network.Processor)
> [2018-02-20 12:02:08,349] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.149.18:35182-96705 (kafka.network.Processor)
> [2018-02-20 12:02:08,379] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.150.25:54456-96717 (kafka.network.Processor)
> [2018-02-20 12:02:08,448] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.159.20:36388-96720 (kafka.network.Processor)
> [2018-02-20 12:02:08,683] WARN Attempting to send response via channel for which there
is no open connection, connection id 10.132.0.32:9092-10.14.157.110:41922-96740 (kafka.network.Processor)
> {code}
> Also on the ISR broker, the controller log shows this:
> {code:java}
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]: Controller 3 connected
to 10.132.0.32:9092 (id: 3 rack: null) for sending state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]: Controller 3 connected
to 10.132.0.10:9092 (id: 0 rack: null) for sending state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]: Controller 3 connected
to 10.132.0.12:9092 (id: 1 rack: null) for sending state change requests (kafka.controller.RequestSendThread){code}
> And the non-ISR brokers show these kind of errors:
>  
> {code:java}
> 2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Error
in fetch to broker 3, request (type=FetchRequest, replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760,
fetchData={......................}, isolationLevel=READ_UNCOMMITTED) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response was read
>  at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95)
>  at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
>  at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205)
>  at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41)
>  at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
>  at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message