Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9171F18D50 for ; Tue, 9 Feb 2016 14:26:13 +0000 (UTC) Received: (qmail 45597 invoked by uid 500); 9 Feb 2016 14:26:03 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 45502 invoked by uid 500); 9 Feb 2016 14:26:03 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 45492 invoked by uid 99); 9 Feb 2016 14:26:03 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2016 14:26:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id EB7AD1804E3 for ; Tue, 9 Feb 2016 14:26:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.285 X-Spam-Level: ** X-Spam-Status: No, score=2.285 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id IDE85QvROl_6 for ; Tue, 9 Feb 2016 14:26:02 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id CDABB429A6 for ; Tue, 9 Feb 2016 14:26:01 +0000 (UTC) Received: from mjoe.nabble.com (unknown [162.253.133.57]) by mwork.nabble.com (Postfix) with ESMTP id 5A49813154858 for ; Tue, 9 Feb 2016 06:18:55 -0800 (PST) Date: Tue, 9 Feb 2016 06:01:31 -0800 (PST) From: shikhar To: user@flink.apache.org Message-ID: <1455026491009-4816.post@n4.nabble.com> In-Reply-To: References: <1454961437146-4782.post@n4.nabble.com> <1454964196626-4786.post@n4.nabble.com> <1454964664954-4788.post@n4.nabble.com> Subject: Re: Kafka partition alignment for event time MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I am assigning timestamps using a threshold-based extractor -- the static delta from last timestamp is probably sufficient and the PriorityQueue for allowing outliers not necessary, that is something I added while figuring out what was going on. The timestamps across partitions don't differ that much in normal operation when stream processing is caught up with the head of the partitions, so the thresholding works well. However, during catch-up, like if I stop for a bit & start the job again, or there is no offset in ZK and I'm using 'auto.offset.reset=smallest', the source tends to emit messages with much larger deviations, and the timestamp extraction which is not partition-aware will start providing an incorrect watermark. Aljoscha Krettek wrote > Hi, > in general it should not be a problem if one parallel instance of a sink > is responsible for several Kafka partitions. It can become a problem if > the timestamps in the different partitions differ by a lot and the > watermark assignment logic is not able to handle this. > > How are you assigning the timestamps/watermarks in your job? > > Cheers, > Aljoscha >> On 08 Feb 2016, at 21:51, shikhar < > shikhar@ > > wrote: >> >> Stephan explained in that thread that we're picking the min watermark >> when >> doing operations that join streams from multiple sources. If we have m:n >> partition-source assignment where m>n, the source is going to end up with >> the max watermark. Having m<=n ensures that the lowest watermark is used. >> >> Re: automatic enforcement, perhaps allowing for more than 1 Kafka >> partition >> on a source should require opt-in, e.g. allowOversubscription() >> >> >> >> -- >> View this message in context: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4788.html >> Sent from the Apache Flink User Mailing List archive. mailing list >> archive at Nabble.com. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4816.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.