kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewen Cheslack-Postava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5741) Prioritize threads in Connect distributed worker process
Date Wed, 06 Sep 2017 18:38:01 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155857#comment-16155857
] 

Ewen Cheslack-Postava commented on KAFKA-5741:
----------------------------------------------

It would be good to have clear indications this is actually a problem in practice and that
other threads starving the herder thread caused it to rebalance. First, heartbeating actually
happens in a background thread, so you'd have to starve that thread as well for the session
timeout. And the actual processing done in the thread is very minimal, so you'd have to completely
starve that thread for a long time -- it's much more likely that things like waiting for other
threads to flush data during a rebalance is what causes it to fall out of the group.

I'm also skeptical of the prioritization because to me, if this really occurred for this reason,
it would suggest that the hardware is just underprovisioned for the workload. Prioritizing
the DistributedHerder thread would probably just end up starving other threads if there really
is that much resource contention, and so the connectors won't even really be functioning correctly
anyway...

> Prioritize threads in Connect distributed worker process
> --------------------------------------------------------
>
>                 Key: KAFKA-5741
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5741
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.11.0.0
>            Reporter: Randall Hauch
>            Priority: Critical
>
> Connect's distributed worker process uses the {{DistributedHerder}} to perform all administrative
operations, including: starting, stopping, pausing, resuming, reconfiguring connectors; rebalancing;
etc. The {{DistributedHerder}} uses a single threaded executor service to do all this work
and to do it sequentially. If this thread gets preempted for any reason (e.g., connector tasks
are bogging down the process, DoS, etc.), then the herder's membership in the group may be
dropped, causing a rebalance.
> This herder thread should be run at a much higher priority than all of the other threads
in the system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message