kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Hauch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5896) Kafka Connect task threads never interrupted
Date Fri, 09 Feb 2018 17:00:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358670#comment-16358670
] 

Randall Hauch commented on KAFKA-5896:
--------------------------------------

[~ewencp], do you have any thoughts on this? I know in the past you've talked about not wanting
to do this since some developers won't properly implement the interruption. I agree that not
everyone implements it correctly, but we could at least _try_ to cancel the tasks. 

> Kafka Connect task threads never interrupted
> --------------------------------------------
>
>                 Key: KAFKA-5896
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5896
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Nick Pillitteri
>            Assignee: Nick Pillitteri
>            Priority: Minor
>
> h2. Problem
> Kafka Connect tasks associated with connectors are run in their own threads. When tasks
are stopped or restarted, a flag is set - {{stopping}} - to indicate the task should stop
processing records. However, if the thread the task is running in is blocked (waiting for
a lock or performing I/O) it's possible the task will never stop.
> I've created a connector specifically to demonstrate this issue (along with some more
detailed instructions for reproducing the issue): https://github.com/smarter-travel-media/hang-connector
> I believe this is an issue because it means that a single badly behaved connector (any
connector that does I/O without timeouts) can cause the Kafka Connect worker to get into a
state where the only solution is to restart the JVM.
> I think, but couldn't reproduce, that this is the cause of this problem on Stack Overflow:
https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work
> h2. Expected Result
> I would expect the Worker to eventually interrupt the thread that the task is running
in. In the past across various other libraries, this is what I've seen done when a thread
needs to be forcibly stopped.
> h2. Actual Result
> In actuality, the Worker sets a {{stopping}} flag and lets the thread run indefinitely.
It uses a timeout while waiting for the task to stop but after this timeout has expired it
simply sets a {{cancelled}} flag. This means that every time a task is restarted, a new thread
running the task will be created. Thus a task may end up with multiple instances all running
in their own threads when there's only supposed to be a single thread.
> h2. Steps to Reproduce
> The problem can be replicated by using the connector available here: https://github.com/smarter-travel-media/hang-connector
> Apologies for how involved the steps are.
> I've created a patch that forcibly interrupts threads after they fail to gracefully shutdown
here: https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5
> I've confirmed that this fixes the issue. I can add some unit tests and submit a PR if
people agree that this is a bug and interrupting threads is the right fix.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message