Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F0907200D08 for ; Wed, 23 Aug 2017 02:40:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EEE54167ECC; Wed, 23 Aug 2017 00:40:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4198F167EC5 for ; Wed, 23 Aug 2017 02:40:08 +0200 (CEST) Received: (qmail 94893 invoked by uid 500); 23 Aug 2017 00:40:06 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 94882 invoked by uid 99); 23 Aug 2017 00:40:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Aug 2017 00:40:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8C4111A0B78 for ; Wed, 23 Aug 2017 00:40:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id rf3_p9zzNMig for ; Wed, 23 Aug 2017 00:40:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5B24760D12 for ; Wed, 23 Aug 2017 00:40:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 73FE6E0059 for ; Wed, 23 Aug 2017 00:40:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8CA4B25387 for ; Wed, 23 Aug 2017 00:40:00 +0000 (UTC) Date: Wed, 23 Aug 2017 00:40:00 +0000 (UTC) From: "Guozhang Wang (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-4113) Allow KTable bootstrap MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 23 Aug 2017 00:40:09 -0000 [ https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-4113: --------------------------------- Issue Type: New Feature (was: Sub-task) Parent: (was: KAFKA-2590) > Allow KTable bootstrap > ---------------------- > > Key: KAFKA-4113 > URL: https://issues.apache.org/jira/browse/KAFKA-4113 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Matthias J. Sax > Assignee: Guozhang Wang > > On the mailing list, there are multiple request about the possibility to "fully populate" a KTable before actual stream processing start. > Even if it is somewhat difficult to define, when the initial populating phase should end, there are multiple possibilities: > The main idea is, that there is a rarely updated topic that contains the data. Only after this topic got read completely and the KTable is ready, the application should start processing. This would indicate, that on startup, the current partition sizes must be fetched and stored, and after KTable got populated up to those offsets, stream processing can start. > Other discussed ideas are: > 1) an initial fixed time period for populating > (it might be hard for a user to estimate the correct value) > 2) an "idle" period, ie, if no update to a KTable for a certain time is > done, we consider it as populated > 3) a timestamp cut off point, ie, all records with an older timestamp > belong to the initial populating phase > The API change is not decided yet, and the API desing is part of this JIRA. > One suggestion (for option (4)) was: > {noformat} > KTable table = builder.table("topic", 1000); // populate the table without reading any other topics until see one record with timestamp 1000. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)