Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9260D200C7C for ; Mon, 22 May 2017 05:08:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 90EB2160BD0; Mon, 22 May 2017 03:08:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D684E160BC5 for ; Mon, 22 May 2017 05:08:07 +0200 (CEST) Received: (qmail 86865 invoked by uid 500); 22 May 2017 03:08:06 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 86854 invoked by uid 99); 22 May 2017 03:08:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 May 2017 03:08:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4D43CC0B6B for ; Mon, 22 May 2017 03:08:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id T0dtjhth8rIs for ; Mon, 22 May 2017 03:08:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1649F5FE2F for ; Mon, 22 May 2017 03:08:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 61B08E0D50 for ; Mon, 22 May 2017 03:08:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 15F9D21B56 for ; Mon, 22 May 2017 03:08:04 +0000 (UTC) Date: Mon, 22 May 2017 03:08:04 +0000 (UTC) From: "Jiangjie Qin (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-3995) Split the ProducerBatch and resend when received RecordTooLargeException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 22 May 2017 03:08:08 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-3995: -------------------------------- Fix Version/s: 0.11.0.0 > Split the ProducerBatch and resend when received RecordTooLargeException > ------------------------------------------------------------------------ > > Key: KAFKA-3995 > URL: https://issues.apache.org/jira/browse/KAFKA-3995 > Project: Kafka > Issue Type: Improvement > Components: clients > Affects Versions: 0.10.0.0 > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.11.0.0 > > > We recently see a few cases where RecordTooLargeException is thrown because the compressed message sent by KafkaProducer exceeded the max message size. > The root cause of this issue is because the compressor is estimating the batch size using an estimated compression ratio based on heuristic compression ratio statistics. This does not quite work for the traffic with highly variable compression ratios. > For example, if the batch size is set to 1MB and the max message size is 1MB. Initially a the producer is sending messages (each message is 1MB) to topic_1 whose data can be compressed to 1/10 of the original size. After a while the estimated compression ratio in the compressor will be trained to 1/10 and the producer would put 10 messages into one batch. Now the producer starts to send messages (each message is also 1MB) to topic_2 whose message can only be compress to 1/5 of the original size. The producer would still use 1/10 as the estimated compression ratio and put 10 messages into a batch. That batch would be 2 MB after compression which exceeds the maximum message size. In this case the user do not have many options other than resend everything or close the producer if they care about ordering. > This is especially an issue for services like MirrorMaker whose producer is shared by many different topics. > To solve this issue, we can probably add a configuration "enable.compression.ratio.estimation" to the producer. So when this configuration is set to false, we stop estimating the compressed size but will close the batch once the uncompressed bytes in the batch reaches the batch size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)