kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak
Date Fri, 16 Jun 2017 02:00:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051289#comment-16051289

Jun Rao commented on KAFKA-5007:

[~huxi_2b], that's a good thought. If we hit an unhandled exception in Selector.connect(),
the ReplicaFetcherThread should log a warning "Error in fetch to broker" with a stack trace.
[~joseph.aliase07@gmail.com], do you see that in the broker log?

> Kafka Replica Fetcher Thread- Resource Leak
> -------------------------------------------
>                 Key: KAFKA-5007
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5007
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, network
>    Affects Versions:,,
>         Environment: Centos 7
> Jave 8
>            Reporter: Joseph Aliase
>            Priority: Critical
>              Labels: reliability
>         Attachments: jstack-kafka.out, jstack-zoo.out, lsofkafka.txt, lsofzookeeper.txt
> Kafka is running out of open file descriptor when system network interface is done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version The open file descriptor
for the account running Kafka is set to 100000.
> During an upgrade, network interface went down. Outage continued for 12 hours eventually
all the broker crashed with java.io.IOException: Too many open files error.
> We repeated the test in a lower environment and observed that Open Socket count keeps
on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica fetcher
thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   for the broker
pid continued to increase although NIC was down leading to  Open File descriptor error. 

This message was sent by Atlassian JIRA

View raw message