kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Fodor (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (KAFKA-4043) User-defined handler for topology restart
Date Thu, 08 Sep 2016 00:39:20 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Greg Fodor resolved KAFKA-4043.
    Resolution: Not A Problem

> User-defined handler for topology restart
> -----------------------------------------
>                 Key: KAFKA-4043
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4043
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions:
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
> Since Kafka Streams is just a library, there's a lot of cool stuff we've been able to
do that would be trickier if it were part of a larger cluster-oriented job execution system
that had assumptions about the semantics of a job. One of the jobs we have uses Kafka Streams
to do top level data flow, and then one of our processors actually will kick off background
threads to do work based upon the data flow state. Happy to fill in more details of our use-case,
but fundamentally the model is that we have a Kafka Streams data flow that is reading state
from upstream, and that state dictates that work needs to be done, which results in a dedicated
work thread to be spawned by our job.
> This works great, but we're running into an issue when there is partition reassignment,
since we have no way to detect this and cleanly shut down these threads. In our case, we'd
like to shut down the background worker threads if there is a partition rebalance or if the
job raises an exception and attempts to restart. In practice what is happening is we are getting
duplicate threads for the same work on a partition rebalance.
> Implementation-wise, this seems like some type of event handler that can be attached
to the topology at build time that can will be called when the data flow needs to rebalance
or rebuild its task threads in general (ideally passing as much information about the reason
along.) I could imagine this being factored similarly to the KafkaStreams#setUncaughtExceptionHandler.

This message was sent by Atlassian JIRA

View raw message