kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Fodor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3811) Introduce Kafka Streams metrics recording levels
Date Fri, 10 Jun 2016 17:13:21 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324838#comment-15324838
] 

Greg Fodor commented on KAFKA-3811:
-----------------------------------

Hey [~aartigupta], I ran an attached yourkit profiler to one of our jobs running dark against
production data. The job has 200-300 topic-partition pairs and generally discards most messages
early in the pipeline, and was processing a few thousand tps from the top level topics. Unfortunately
since this issue came up we implemented changes to reduce the amount of data running through
the system (discarding it earlier) so we didn't have to worry about this performance problem.
In my tests a majority of the CPU time of the job was spent inside of the code walking and
emitting to the Sensors for the per-message process metrics and the per-k/v read/write latency
metrics. I also found 6-7% of the time was spent in the fetcher metrics which was addressed
here: https://github.com/apache/kafka/pull/1464. 

Good news: I managed to find the snapshot data :) I will attach it here. The majority of the
time is *not* the milliseconds() call but the actual (synchronized?) walk of Sensors in Sensor.record.

> Introduce Kafka Streams metrics recording levels
> ------------------------------------------------
>
>                 Key: KAFKA-3811
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3811
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Greg Fodor
>            Assignee: aarti gupta
>
> Follow-up from the discussions here:
> https://github.com/apache/kafka/pull/1447
> https://issues.apache.org/jira/browse/KAFKA-3769
> The proposal is to introduce configuration to control the granularity/volumes of metrics
emitted by Kafka Streams jobs, since the per-record level metrics introduce non-trivial overhead
and are possibly less useful once a job has been optimized. 
> Proposal from guozhangwang:
> level0 (stream thread global): per-record process / punctuate latency, commit latency,
poll latency, etc
> level1 (per processor node, and per state store): IO latency, per-record .. latency,
forward throughput, etc.
> And by default we only turn on level0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message