From dev-return-105463-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Jul 3 01:01:47 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id DFA0E18060E for ; Wed, 3 Jul 2019 03:01:46 +0200 (CEST) Received: (qmail 46075 invoked by uid 500); 3 Jul 2019 01:01:44 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 46060 invoked by uid 99); 3 Jul 2019 01:01:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2019 01:01:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E863BC0DEA for ; Wed, 3 Jul 2019 01:01:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sl9dyUCYWTHR for ; Wed, 3 Jul 2019 01:01:41 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.170; helo=mail-qk1-f170.google.com; envelope-from=adam.bellemare@gmail.com; receiver= Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 6CE82BC775 for ; Wed, 3 Jul 2019 01:01:41 +0000 (UTC) Received: by mail-qk1-f170.google.com with SMTP id r4so422578qkm.13 for ; Tue, 02 Jul 2019 18:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=mo8hqCleQIyrhiADQD39sUuu8pxHVWAb5nh+Fx6QYns=; b=h+AQ+fd+5Kpwwxp61iuudCtAmm75v9s8zDDwOp68vXiE+zqzmYWOMrCpU9Qgi2FkpV nfyphBWjlOSlNiqrWOk9hV3ll9NDwTAPBgGqmhIq2+y4qhiNbODC3jV3Bs06Ovb8ZOu2 xdoUDK/f83foAGkNO2GZZOiW7U9p02Oy0cMGCcT8IW7xCNQWFkqMtxA+aGj4Izu1kA4F 1/ulSwUVhlG/OQ+6s8ChhxDlolcMwommLQ67oOZXwt6YoHaR5dQOuTFVCyFfdvpUQ8Mm fm2wyemmZfP+02IKcJzGg0vR4t1oFaoFJ9EO4L5KFDmatRAnf+Wpn3gJkO07kldUJj/9 brdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=mo8hqCleQIyrhiADQD39sUuu8pxHVWAb5nh+Fx6QYns=; b=eyqh/6qdVxRZ3wTL8UqNPwUBUrKAydnQR7vZ4Tphy6mrHuk3Vgsmqw7KU5M1El5cYn DWoH+gyuB+y8mxDJDo/WxaX+ZP+LnReI3/w3VO+OcuZHmpTCroziklH4QbqmantHtNg0 rI7TGgJNLv54iWzgawuKYHSxBMYFGK7MczIq8bUAYIp27iozVAJDeOMweODWNEd/o0b6 TVxU/XGjaB3T0OcaUyK6hOj/BRlopNH2hvBOwxS7Punh1X8OkNRJR9VgWYFUu5qdklP8 F2fQErEaTCPXMvp0sH5Yzr1EWFzzBPZqlf7wFLByv4kQLFHc6GZnT7QDOj6z0IhBpctQ tccw== X-Gm-Message-State: APjAAAUV3xG/2glKzu9TYaCE+feD0Qeaqjz5Ho5MpumGktZgN1etGCQ8 8qNtdsl2dn8CMrH4k47W3tks7g6WdKsOy2FXcleGVp8D X-Google-Smtp-Source: APXvYqxb1Jg2PDhnHoFrZkRUaner49HtL1Jy+OaJyFDNGQfOg1oZX09Dui9lVM23sIdhueBMU83ogBJt3FMBKwb7tk8= X-Received: by 2002:a05:620a:1286:: with SMTP id w6mr27101171qki.219.1562115694863; Tue, 02 Jul 2019 18:01:34 -0700 (PDT) MIME-Version: 1.0 From: Adam Bellemare Date: Tue, 2 Jul 2019 21:01:23 -0400 Message-ID: Subject: Synchronized consumption + processing based on timestamps? To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="000000000000172ee8058cbc67b2" --000000000000172ee8058cbc67b2 Content-Type: text/plain; charset="UTF-8" Hi All The use-case is pretty simple. Lets say we have a history of events with the following: key=userId, value = (timestamp, productId) and we want to remap it to (just as we would with an internal topic): key=productId, value=(original_timestamp, userId) Now, say I have 30 days of backlog, and 2 partitions for the input topic. I spin up two instances and let them process the data from the start of time, but one instance is only half as powerful (less CPU, Mem, etc), such that instance 0 processes X events / sec which instance 1 processes x/2 events /sec. My question is: Are there *any* clients, kafka streams, spark, flink, etc or otherwise, that would allow these two consumers to remain in sync *according to their timestamps*? I don't want to see events with original_timestamp of today (from instance 0) interleaved with events from 15 days ago (from the underpowered instance 1). Yes, I do realize this would bring my throughput down, but I am looking for any existing tech that would effectively say *"cap the time difference of events coming out of this repartition processor at 60 seconds max"* Currently, I am not aware of ANY open source solutions that do this for Kafka, but if someone has heard otherwise I would love to know. Alternately, perhaps this could lead to a KIP. Thanks Adam --000000000000172ee8058cbc67b2--