From dev-return-12794-archive-asf-public=cust-asf.ponee.io@beam.apache.org Tue Oct 9 20:07:25 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A8FCC180668 for ; Tue, 9 Oct 2018 20:07:24 +0200 (CEST) Received: (qmail 44219 invoked by uid 500); 9 Oct 2018 18:07:23 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 44209 invoked by uid 99); 9 Oct 2018 18:07:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Oct 2018 18:07:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 81E711A1994 for ; Tue, 9 Oct 2018 18:07:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -13.612 X-Spam-Level: X-Spam-Status: No, score=-13.612 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Pw2Um2U-sDwa for ; Tue, 9 Oct 2018 18:07:21 +0000 (UTC) Received: from mail-it1-f180.google.com (mail-it1-f180.google.com [209.85.166.180]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id F338C5F490 for ; Tue, 9 Oct 2018 18:07:20 +0000 (UTC) Received: by mail-it1-f180.google.com with SMTP id l191-v6so4073665ita.4 for ; Tue, 09 Oct 2018 11:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=mEeNOQxac7IJ9hKIHcO/W6EgPOxFlRQx3G3Jyo9vDtA=; b=SaCgIZqTDKIvyFx4ykRhQAK2UB7t3hmBPerX2hx9DAaUpi0PsHuxUUoXGCAkNM8sE1 BAaDHbN9gKKGD2X7c/iyotWiBShwSQ63GI/L+4X6AFINk5P1PryrVX7YxPcWwjEK5PBL 3UcgHrHv22+lbIe4l4A63BJgMTSb7pXcKrIOrQ83LgfxSequxB5MCWG5PSFlQqKICN7o 8tcruie4cD31Y2xZEVSzfaFY7MiXd8HdeSfAEZfXBhIGT6jIrGcXhvACpbBAPMvIwq7B XPSIZsNzloVaV0ZSDutF6aqwMJ1nU/lEUryaqtaD7tcfwIZo8NY2D7wcQkOVq530CCaT pN4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=mEeNOQxac7IJ9hKIHcO/W6EgPOxFlRQx3G3Jyo9vDtA=; b=oFbsxKIuDKnKSZeAJq7h8akGFhYzyjuql7Xp1qP+JBAOtEh1Hf8XeheDHAxO/68HQg VFzq9+K1z2Tju2PQ6nZvNoPg2CKB412LjNLLPgCJvgvphuHmcl5ssHXDeI52GcE3tNro ydsmwna0tx5dyIVkYB9gunc8dMCyFSzFYY54IS1v73QjkCsDzT5Qqb5phgng0JWQB0bJ LQBTh0QunVSmMLWEMSbD9MygYWlNcd06M/lXbyQ2PakL+xYFJeuHzUJ56n0KeVkc+/0K YGWCe6znbV+2kv8XOl0XfxP1CaJjamSSo8H1oqCmSzGGMrpEXUy8hMWzE5c+z/AHPM6b f/pQ== X-Gm-Message-State: ABuFfoh5MDxGPkFpja7HAFVJaPdEUGAYmcbN/qV/lyDt2zHHVKtaAEJp cdZhoVGiz9f+XzetHyPNbs2zxKUuhaJ7i1idnPdppK7dkgFhKw== X-Google-Smtp-Source: ACcGV60kLSDuFPzUbDtEzQFzuzlrKgVABzbFoTyEjKSx4t2gWdDeW+RmBPYApe/JSJyFSuUV0lVwbbJ2SiZNzT2M0qc= X-Received: by 2002:a24:d2c5:: with SMTP id z188-v6mr2936569itf.96.1539108433458; Tue, 09 Oct 2018 11:07:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Reuven Lax Date: Tue, 9 Oct 2018 11:07:00 -0700 Message-ID: Subject: Re: Update state after firing To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="000000000000750fdc0577cf9bf6" --000000000000750fdc0577cf9bf6 Content-Type: text/plain; charset="UTF-8" Have you considered using Beam's state API for this? On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu wrote: > Hi, guys, > > Current triggering allows us to either discard the state or accumulate the > state after a window pane is fired. We use the extractOutput() in CombinFn > to return the output value after the firing. All these have been working > well for us. We do have a use case which seems not handled here: we would > like to update the state after the firing. Let me illustrate this use case > by an example: we have a 10-min fixed window with repeatedly early trigger > of 1 min over an input stream which contains events of user id and page id. > The accumulator for the window has two parts: 1) set of page ids already > seen; 2) set of user ids who first views a page in this window (this is > done by looking up #1). For each early firing, we want to output #2, and > clear the second part of the state. But we would like to keep the #1 around > for later calculations in this window. This example might be too simple to > make sense, but it comes from one of our real use cases which is needed for > some anti-abuse scenarios. > > To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT > accumulator) in current CombinFn? That way the user can choose to update > the state partially if needed, e.g. for our use case. Any feedback is very > welcome. > > Thanks, > Xinyu > > > > > --000000000000750fdc0577cf9bf6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Have you considered using Beam's state API for this?

On Tue, Oct 9, 2018= at 11:03 AM Xinyu Liu <xinyuli= u.us@gmail.com> wrote:
Hi, guys,

Current triggering allows us to ei= ther discard the state or accumulate the state after a window pane is fired= . We use the extractOutput() in CombinFn to return the output value after t= he firing. All these have been working well for us. We do have a use case w= hich seems not handled here: we would like to update the state after the fi= ring. Let me illustrate this use case by an example: we have a 10-min fixed= window with repeatedly early trigger of 1 min over an input stream which c= ontains events of user id and page id. The accumulator for the window has t= wo parts: 1) set of page ids already seen; 2) set of user ids who first vie= ws a page in this window (this is done by looking up #1). For each early fi= ring, we want to output #2, and clear the second part of the state. But we = would like to keep the #1 around for later calculations in this window. Thi= s example might be too simple to make sense, but it comes from one of our r= eal use cases which is needed for some anti-abuse scenarios.

=
To address this use case, is it OK to add a=C2=A0AccumT=C2=A0updateAfterFiring(AccumT accumulator) in current CombinFn? That way the user can ch= oose to update the state partially if needed, e.g. for our use case. Any fe= edback is very welcome.=C2=A0

Thanks,
Xi= nyu




<= /div>
--000000000000750fdc0577cf9bf6--