Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6242D200BAE for ; Fri, 14 Oct 2016 05:58:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 60C05160AF6; Fri, 14 Oct 2016 03:58:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A8217160AE4 for ; Fri, 14 Oct 2016 05:58:22 +0200 (CEST) Received: (qmail 45099 invoked by uid 500); 14 Oct 2016 03:58:21 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 44682 invoked by uid 99); 14 Oct 2016 03:58:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2016 03:58:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9E9382C4C7D for ; Fri, 14 Oct 2016 03:58:20 +0000 (UTC) Date: Fri, 14 Oct 2016 03:58:20 +0000 (UTC) From: "Ismael Juma (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-4298) LogCleaner does not convert compressed message sets properly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 14 Oct 2016 03:58:23 -0000 [ https://issues.apache.org/jira/browse/KAFKA-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-4298: ------------------------------- Description: When cleaning the log, we don't want to convert messages to the format configured for the topic due to KAFKA-3915. However, the cleaner logic for writing compressed messages (in case some messages in the message set were not retained) writes the topic message format version in the magic field of the outer message instead of the actual message format. The choice of the absolute/relative offset for the inner messages will also be based on the topic message format version. For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, the nested messages will still have magic=0, but the message offsets will be relative. If this happens, there does not seem to be an easy way to recover without manually fixing up the log. The offsets still work correctly as both the clients and broker use the outer message format version to decide if the relative offset needs to be converted to an absolute offset. So the main problem turns out to be that `ByteBufferMessageSet.deepIterator` throws an exception if there is a mismatch between outer and inner message format version. {code} if (newMessage.magic != wrapperMessage.magic) throw new IllegalStateException(s"Compressed message has magic value ${wrapperMessage.magic} " + s"but inner message has magic value ${newMessage.magic}") {code} was:When cleaning the log, we attempt to write the cleaned messages using the message format configured for the topic, but as far as I can tell, we do not convert the wrapped messages in compressed message sets. For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, but the nested messages will still have magic=0. If this happens, there does not seem to be an easy way to recover without manually fixing up the log. > LogCleaner does not convert compressed message sets properly > ------------------------------------------------------------ > > Key: KAFKA-4298 > URL: https://issues.apache.org/jira/browse/KAFKA-4298 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.0.1 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 0.10.1.0, 0.10.0.2 > > > When cleaning the log, we don't want to convert messages to the format configured for the topic due to KAFKA-3915. However, the cleaner logic for writing compressed messages (in case some messages in the message set were not retained) writes the topic message format version in the magic field of the outer message instead of the actual message format. The choice of the absolute/relative offset for the inner messages will also be based on the topic message format version. > For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, the nested messages will still have magic=0, but the message offsets will be relative. If this happens, there does not seem to be an easy way to recover without manually fixing up the log. > The offsets still work correctly as both the clients and broker use the outer message format version to decide if the relative offset needs to be converted to an absolute offset. So the main problem turns out to be that `ByteBufferMessageSet.deepIterator` throws an exception if there is a mismatch between outer and inner message format version. > {code} > if (newMessage.magic != wrapperMessage.magic) > throw new IllegalStateException(s"Compressed message has magic value ${wrapperMessage.magic} " + > s"but inner message has magic value ${newMessage.magic}") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)