Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CAD44200B64 for ; Tue, 2 Aug 2016 11:32:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C97DC160A76; Tue, 2 Aug 2016 09:32:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1D22F160A8C for ; Tue, 2 Aug 2016 11:32:21 +0200 (CEST) Received: (qmail 67415 invoked by uid 500); 2 Aug 2016 09:32:21 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 67403 invoked by uid 99); 2 Aug 2016 09:32:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2016 09:32:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8FE0A2C0D66 for ; Tue, 2 Aug 2016 09:32:20 +0000 (UTC) Date: Tue, 2 Aug 2016 09:32:20 +0000 (UTC) From: "Andrey Neporada (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-2063) Bound fetch response size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 02 Aug 2016 09:32:23 -0000 [ https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403676#comment-15403676 ] Andrey Neporada commented on KAFKA-2063: ---------------------------------------- (a) I refer to some new server side setting - something like fetch.partition.max.bytes (?). Broker setting replica.fetch.max.bytes should be deprecated along with consumer settings fetch.message.max.bytes and max.partition.fetch.bytes. (b) Maybe I am running ahead too much here. In context of this ticket, yes, the only goal of reordering is to make progress and enforce fairness. And this all can be done on client side. (c) I mean to make fetch request deterministic on server side - fetch responses will go in order requested by client (d) Yes, we should clearly document that clients who want to limit entire fetch response should also deploy some method to avoid starvation/unfairness - either random shuffling or round robin. Random shuffling seems to be easier to implement and IMHO it will work good enough for ReplicaFetcherThread. In general, it looks like most people like to 1) retire partition level limit from fetch request 2) keep fetching order the same as the order of partitions in fetch request Should I update PR? Any objections? > Bound fetch response size > ------------------------- > > Key: KAFKA-2063 > URL: https://issues.apache.org/jira/browse/KAFKA-2063 > Project: Kafka > Issue Type: Improvement > Reporter: Jay Kreps > > Currently the only bound on the fetch response size is max.partition.fetch.bytes * num_partitions. There are two problems: > 1. First this bound is often large. You may chose max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you also need to consume 1k partitions this means you may receive a 1GB response in the worst case! > 2. The actual memory usage is unpredictable. Partition assignment changes, and you only actually get the full fetch amount when you are behind and there is a full chunk of data ready. This means an application that seems to work fine will suddenly OOM when partitions shift or when the application falls behind. > We need to decouple the fetch response size from the number of partitions. > The proposal for doing this would be to add a new field to the fetch request, max_bytes which would control the maximum data bytes we would include in the response. > The implementation on the server side would grab data from each partition in the fetch request until it hit this limit, then send back just the data for the partitions that fit in the response. The implementation would need to start from a random position in the list of topics included in the fetch request to ensure that in a case of backlog we fairly balance between partitions (to avoid first giving just the first partition until that is exhausted, then the next partition, etc). > This setting will make the max.partition.fetch.bytes field in the fetch request much less useful and we should discuss just getting rid of it. > I believe this also solves the same thing we were trying to address in KAFKA-598. The max_bytes setting now becomes the new limit that would need to be compared to max_message size. This can be much larger--e.g. setting a 50MB max_bytes setting would be okay, whereas now if you set 50MB you may need to allocate 50MB*num_partitions. > This will require evolving the fetch request protocol version to add the new field and we should do a KIP for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)