kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apurva Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5781) Frequent long produce latency periods that result in reduced produce rate.
Date Tue, 29 Aug 2017 16:11:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145563#comment-16145563

Apurva Mehta commented on KAFKA-5781:

This looks eerily similar to : https://issues.apache.org/jira/browse/KAFKA-4614

Which file system are you using? What do your jvm metrics look like during the times of the

> Frequent long produce latency periods that result in reduced produce rate.
> --------------------------------------------------------------------------
>                 Key: KAFKA-5781
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5781
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions:
>         Environment: CentOS Linux release 7.3.1611 , Kernel 3.10, java version "1.8.0_121"
>            Reporter: Raoufeh Hashemian
>         Attachments: frequent_latency_increase_diskactivity.png, frequent_latency_increase.png,
> When we upgraded from Kafka 0.10,2 to 0.11.0 , I started to see frequent throughput drops
with a predictable pattern (attached file shows the pattern in a 14 hour period). This resulted
in an a degradation of up to 30% in our overall produce throughput.
> The drops can be correlated to the significant increase in 99th percentile latency (up
to 4 seconds). We have a cluster of 6 brokers and a single topic. The problem happens both
with/without consumers running so I only included a case without consumers.
> There is no specific message in the broker logs when the latency surge happens.  However,
I found a correlation between the log rotation messages in the log and the the longer cycles
in the pattern (details shown in the attached graph:frequent_latency_increase.png)
> Each increased latency period takes 5 to 20 minutes to finish (shown in the zoomed graph
in the attached files). 
> The broker cpu utilization goes down during this time and some read disk activity is
observed (see attached graph)
> This pattern started to appear in our environment exactly at the time when we switched
to kafka 0.11.0. We kept the idempotence as false and didn`t make any configuration change
as we switched. So I was wondering if it could be a bug or configuration that needs to be
changed after upgrade?

This message was sent by Atlassian JIRA

View raw message