kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiangjie Qin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest
Date Wed, 11 Oct 2017 02:22:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199706#comment-16199706
] 

Jiangjie Qin commented on KAFKA-6029:
-------------------------------------

[~junrao] Good point. That seems more likely to happen. Just to check if I understand correctly.
Are you suggesting the following solution?
1. Let each broker have an epoch which changes on restart.
2. During controlled shtudown, the controller will send LeaderAndIsrRequest with the new ISR
+ shutting down broker with epoch.
3. Add the broker epoch to the FetchRequest so the each follower will send FetchRequest with
their broker epoch.
4. If the leader sees a fetch request from a broker that matches the shutting down broker
and epoch it will not add it back to the ISR.
5. After the broker restarts, the leaders will see a new broker epoch and add the restarted
broker back to ISR.



> Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest
> ------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6029
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6029
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, core
>    Affects Versions: 1.0.0
>            Reporter: Jiangjie Qin
>             Fix For: 1.1.0
>
>
> In the controlled shutdown process, the controller will return the ControlledShutdownResponse
immediately after the state machine is updated. Because the LeaderAndIsrRequests and UpdateMetadataRequests
may not have been successfully processed by the brokers, the leader migration and active ISR
shrink may not have done when the shutting down broker proceeds to shut down. This will cause
some of the leaders to take up to replica.lag.time.max.ms to kick the broker out of ISR. Meanwhile
the produce purgatory size will grow.
> Ideally, the controller should wait until all the LeaderAndIsrRequests and UpdateMetadataRequests
has been acked before sending back the ControlledShutdownResponse.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message