kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6098) Delete and Re-create topic operation could result in race condition
Date Thu, 16 Nov 2017 13:42:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ismael Juma updated KAFKA-6098:
    Labels: reliability  (was: )

> Delete and Re-create topic operation could result in race condition
> -------------------------------------------------------------------
>                 Key: KAFKA-6098
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6098
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>              Labels: reliability
>             Fix For: 1.1.0
> Here is the following process to re-produce this issue:
> 1. Delete a topic using the delete topic request.
> 2. Confirm the topic is deleted using the list topics request.
> 3. Create the topic using the create topic request.
> In step 3) a race condition can happen that the response returns a {{TOPIC_ALREADY_EXISTS}}
error code, indicating the topic has already existed.
> The root cause of the above issue is in the {{TopicDeletionManager}} class:
> {code}
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
> topicsToBeDeleted -= topic
> partitionsToBeDeleted.retain(_.topic != topic)
> kafkaControllerZkUtils.deleteTopicZNode(topic)
> kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic))
> kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic))
> controllerContext.removeTopic(topic)
> {code}
> I.e. it first update the broker's metadata cache through the ISR and metadata update
request, then delete the topic zk path, and then delete the topic-deletion zk path. However,
upon handling the create topic request, the broker will simply try to write to the topic zk
path directly. Hence there is a race condition that between brokers update their metadata
cache (hence list topic request not returning this topic anymore) and zk path for the topic
be deleted (hence the create topic succeed).
> The reason this problem could be exposed, is through current handling logic of the create
topic response, most of which takes {{TOPIC_ALREADY_EXISTS}} as "OK" and moves on, and the
zk path will be deleted later, hence leaving the topic to be not created at all:
> https://github.com/apache/kafka/blob/249e398bf84cdd475af6529e163e78486b43c570/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsKafkaClient.java#L221
> https://github.com/apache/kafka/blob/1a653c813c842c0b67f26fb119d7727e272cf834/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java#L232
> Looking at the code history, it seems this race condition always exist, but testing on
trunk / 1.0 with the above steps it is more likely to happen than before. I wonder if the
ZK async calls have an effect here. cc [~junrao] [~onurkaraman]

This message was sent by Atlassian JIRA

View raw message