kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6634) Delay initiating the txn on producers until initializeTopology with EOS turned on
Date Sun, 11 Mar 2018 18:29:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394595#comment-16394595

ASF GitHub Bot commented on KAFKA-6634:

guozhangwang opened a new pull request #4684: KAFKA-6634: Delay starting new transaction in
URL: https://github.com/apache/kafka/pull/4684
   1. As titled, not starting new transaction since during restoration producer would have
not activity and hence may cause txn expiration.
   1.a. Also delay starting new txn in resuming until initializing topology.
   2. Fixed a minor bug, that when resuming process hits a migration exception, we should
remove that task from the running list if possible.
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Delay initiating the txn on producers until initializeTopology with EOS turned on
> ---------------------------------------------------------------------------------
>                 Key: KAFKA-6634
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6634
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
> In Streams EOS implementation, the created producers for tasks will initiate a txn immediately
after being created in the constructor of `StreamTask`. However, the task may not process
any data and hence producer may not send any records for that started txn for a long time
because of the restoration process. And with default txn.session.timeout valued at 60 seconds,
it means that if the restoration takes more than that amount of time, upon starting the producer
will immediately get the error that its producer epoch is already old.
> To fix this, we should consider instantiating the txn only after the restoration phase
is done. Although this may have a caveat that if the producer is already fenced, it will not
be notified until then, in initializeTopology. But I think this should not be a correctness
issue since during the restoration process we do not make any changes to the processing state.

This message was sent by Atlassian JIRA

View raw message