kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gustafson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (KAFKA-7601) Handle message format downgrades during upgrade of message format version
Date Fri, 03 May 2019 01:12:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Gustafson resolved KAFKA-7601.
       Resolution: Fixed
    Fix Version/s: 2.2.1

The patch fixes the main problem of ensuring the safety of leader election in the presence
of unexpected regressions of the message format. It does not solve the problem of down-conversion
with mixed message formats. 

> Handle message format downgrades during upgrade of message format version
> -------------------------------------------------------------------------
>                 Key: KAFKA-7601
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7601
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Priority: Major
>             Fix For: 2.1.2, 2.2.1
> During an upgrade of the message format, there is a short time during which the configured
message format version is not consistent across all replicas of a partition. As long as all
brokers are using the same binary version (i.e. all have been updated to the latest code),
this typically does not cause any problems. Followers will take whatever message format is
used by the leader. However, it is possible for leadership to change several times between
brokers which support the new format and those which support the old format. This can cause
the version used in the log to flap between the different formats until the upgrade is complete.

> For example, suppose broker 1 has been updated to use v2 and broker 2 is still on v1.
When broker 1 is the leader, all new messages will be written in the v2 format. When broker
2 is leader, v1 will be used. If there is any instability in the cluster or if completion
of the update is delayed, then the log will be seen to switch back and forth between v1 and
v2. Once the update is completed and broker 1 begins using v2, then the message format will
stabilize and everything will generally be ok.
> Downgrades of the message format are problematic, even if they are just temporary. There
are basically two issues:
> 1. We use the configured message format version to tell whether down-conversion is needed.
We assume that the this is always the maximum version used in the log, but that assumption
fails in the case of a downgrade. In the worst case, old clients will see the new format and
likely fail.
> 2. The logic we use for finding the truncation offset during the become follower transition
does not handle flapping between message formats. When the new format is used by the leader,
then the epoch cache will be updated correctly. When the old format is in use, the epoch cache
won't be updated. This can lead to an incorrect result to OffsetsForLeaderEpoch queries.
> We have actually observed the second problem. The scenario went something like this.
Broker 1 is the leader of epoch 0 and writes some messages to the log using the v2 message
format. Broker 2 then becomes the leader for epoch 1 and writes some messages in the v2 format.
On broker 2, the last entry in the epoch cache is epoch 0. No entry is written for epoch 1
because it uses the old format. When broker 1 became a follower, it send an OffsetsForLeaderEpoch
query to broker 2 for epoch 0. Since epoch 0 was the last entry in the cache, the log end
offset was returned. This resulted in localized log divergence.
> There are a few options to fix this problem. From a high level, we can either be stricter
about preventing downgrades of the message format, or we can add additional logic to make
downgrades safe. 
> (Disallow downgrades): As an example of the first approach, the leader could always use
the maximum of the last version written to the log and the configured message format version.

> (Allow downgrades): If we want to allow downgrades, then it make makes sense to invalidate
and remove all entries in the epoch cache following the message format downgrade. This would
basically force us to revert to truncation to the high watermark, which is what you'd expect
from a downgrade.  We would also need a solution for the problem of detecting when down-conversion
is needed for a fetch request. One option I've been thinking about is enforcing the invariant
that each segment uses only one message format version. Whenever the message format changes,
we need to roll a new segment. Then we can simply remember which format is in use by each
segment to tell whether down-conversion is needed for a given fetch request.

This message was sent by Atlassian JIRA

View raw message