Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 97FCC2009C5 for ; Mon, 16 May 2016 17:35:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 96941160A16; Mon, 16 May 2016 15:35:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DDCAC160131 for ; Mon, 16 May 2016 17:35:14 +0200 (CEST) Received: (qmail 90969 invoked by uid 500); 16 May 2016 15:35:13 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 90665 invoked by uid 99); 16 May 2016 15:35:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2016 15:35:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 03E8F2C1F5D for ; Mon, 16 May 2016 15:35:13 +0000 (UTC) Date: Mon, 16 May 2016 15:35:13 +0000 (UTC) From: "Ismael Juma (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-3704) Use default block size in KafkaProducer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 16 May 2016 15:35:15 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-3704: ------------------------------- Description: As discovered in https://issues.apache.org/jira/browse/KAFKA-3565, the current default block size (1K) used in Snappy and GZIP may cause a sub-optimal compression ratio for Snappy, and hence reduce throughput. Because we no longer recompress data in the broker, it also impacts what gets stored on disk. A solution might be to use the default block size, which is 64K in LZ4, 32K in Snappy and 0.5K in GZIP. The downside is that this solution will require more memory allocated outside of the buffer pool and hence users may need to bump up their JVM heap size, especially for MirrorMakers. Using Snappy as an example, it's an additional 2x32k per batch (as Snappy uses two buffers) and one would expect at least one batch per partition. However, the number of batches per partition can be much higher if the broker is slow to acknowledge producer requests (depending on `buffer.memory`, `batch.size`, message size, etc.). Given the above, it seems like a configuration may be needed as the there is no one size fits all. An alternative to a new config is to allocate buffers from the buffer pool and pass them to the compression library. This is possible with Snappy and we could adapt our LZ4 code. It's not possible with GZIP, but it uses a very small buffer by default. Note that we decided that this change was too risky for 0.10.0.0 and reverted the original attempt. was: As discovered in https://issues.apache.org/jira/browse/KAFKA-3565, the current default block size (1K) used in Snappy and GZIP may cause sub-optimal compression ratio for Snappy, and hence reduce throughput. A better solution would be using the default block size, which is 32K in Snappy and 0.5K in GZIP. A notable side-effect is that with Snappy, this solution will require more extra memory allocated out side of the bufferpoll, by {{(32 - 1)K * num.total.partitions}} and hence users may need to bump up their JVM heap size, especially for MirrorMakers. > Use default block size in KafkaProducer > --------------------------------------- > > Key: KAFKA-3704 > URL: https://issues.apache.org/jira/browse/KAFKA-3704 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.10.1.0 > > > As discovered in https://issues.apache.org/jira/browse/KAFKA-3565, the current default block size (1K) used in Snappy and GZIP may cause a sub-optimal compression ratio for Snappy, and hence reduce throughput. Because we no longer recompress data in the broker, it also impacts what gets stored on disk. > A solution might be to use the default block size, which is 64K in LZ4, 32K in Snappy and 0.5K in GZIP. The downside is that this solution will require more memory allocated outside of the buffer pool and hence users may need to bump up their JVM heap size, especially for MirrorMakers. Using Snappy as an example, it's an additional 2x32k per batch (as Snappy uses two buffers) and one would expect at least one batch per partition. However, the number of batches per partition can be much higher if the broker is slow to acknowledge producer requests (depending on `buffer.memory`, `batch.size`, message size, etc.). > Given the above, it seems like a configuration may be needed as the there is no one size fits all. An alternative to a new config is to allocate buffers from the buffer pool and pass them to the compression library. This is possible with Snappy and we could adapt our LZ4 code. It's not possible with GZIP, but it uses a very small buffer by default. > Note that we decided that this change was too risky for 0.10.0.0 and reverted the original attempt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)