Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 82CEB200D29 for ; Thu, 26 Oct 2017 11:19:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 81566160BF2; Thu, 26 Oct 2017 09:19:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A0B421609E8 for ; Thu, 26 Oct 2017 11:19:45 +0200 (CEST) Received: (qmail 11672 invoked by uid 500); 26 Oct 2017 09:19:44 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 11662 invoked by uid 99); 26 Oct 2017 09:19:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Oct 2017 09:19:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BA0C61808A3 for ; Thu, 26 Oct 2017 09:19:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.299 X-Spam-Level: X-Spam-Status: No, score=-0.299 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=okkam-it.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id i0jhyPyY3BNz for ; Thu, 26 Oct 2017 09:19:42 +0000 (UTC) Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com [209.85.213.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 733025F5B8 for ; Thu, 26 Oct 2017 09:19:41 +0000 (UTC) Received: by mail-vk0-f48.google.com with SMTP id n184so1704921vka.7 for ; Thu, 26 Oct 2017 02:19:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=okkam-it.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pgYBx7kEDX9RtzZoX4/i+0nQ2zR8ObkpSPksjyNjOGk=; b=Vd6cu+Th+PSLcKXrxAI09ZbBQzLsbmMWQHrt7ejuthCzG98KJyZwojULv/az1S+ZS9 j3mdurVbxXqp58MrwokhWZ2lZWBAe0YbC9A5g/mseq6G1uKfqLpZMF29UncOEnGKwntH weyl+v2DoEB7dyoaiKQq35yJ3jYYrPlZDcaYhS+Kz27nC65p5kdwqiKl2ymV1UQJ2TNH A7sLY1hwzfojkeEgoM4gzJLBkcGLCsHSpdlYuTMpnpNutlYNpB+/VdsHy5QerQynaF4W lgzp33KraWEEwPOwN5pZr8eC0zFtXeh45mNoEMssvTdc1+EnBMW0cz3ynivfRO1vki/V ZEnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pgYBx7kEDX9RtzZoX4/i+0nQ2zR8ObkpSPksjyNjOGk=; b=XdaiT+ssrP2TY3crqP81leiOuY6WKCSIx0dWRMmEj1bbu8tAYPbahiGU9jjJpt13/G /KWlC6Pfy0De0jbGtZoSnVtm1pDZhbip/s4wK9cGVQEBlPhAupRxFJo28IGX9e3x3PhA fe19uNUq90w3q3e/1YTohBIbUAYEP4X+iOAwp/NmtUXF1wPqonQmydrH+yeMGssY43sm xvureDseZ1y+ZnWm77ILtRvmTcCOJXIVJSdQ1ETA1C8YAuJa/qhkEyy1rtYycX1DLzdS grl0SX4z8SXmcL6/TKkLNMFIHE+t4nlIxmnWwd/Pdg79iHdgtobj+cZFcFkEcQL7r+FW alUw== X-Gm-Message-State: AMCzsaXWKZhLHgCx3jVOz8nx6uLA4N1RHDooCHifLr8pazjm36p4fVsZ Ew84o3TuRKxOpxMD89mUVtxCqatn4iEKYBiwvpRvBg== X-Google-Smtp-Source: ABhQp+SaXftSMqjRDuB7d1dulX9IPngeDjCuOnsiX4bgeAy9bkqKaxkpuOipAXigJnwgnSIUx4PNrIqj6I1LCCFXbSQ= X-Received: by 10.31.163.73 with SMTP id m70mr3333242vke.110.1509009574326; Thu, 26 Oct 2017 02:19:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.79.23 with HTTP; Thu, 26 Oct 2017 02:19:13 -0700 (PDT) X-Originating-IP: [77.43.114.114] In-Reply-To: References: From: Flavio Pompermaier Date: Thu, 26 Oct 2017 11:19:13 +0200 Message-ID: Subject: Re: State snapshotting when source is finite To: Till Rohrmann Cc: Fabian Hueske , user , Aljoscha Krettek Content-Type: multipart/alternative; boundary="001a11417034a4040d055c6fabcd" archived-at: Thu, 26 Oct 2017 09:19:46 -0000 --001a11417034a4040d055c6fabcd Content-Type: text/plain; charset="UTF-8" Done: https://issues.apache.org/jira/browse/FLINK-7930 Best, Flavio On Thu, Oct 26, 2017 at 10:52 AM, Till Rohrmann wrote: > Hi Flavio, > > this kind of feature is indeed useful and currently not supported by > Flink. I think, however, that this feature is a bit trickier to implement, > because Tasks cannot currently initiate checkpoints/savepoints on their > own. This would entail some changes to the lifecycle of a Task and an extra > communication step with the JobManager. However, nothing impossible to do. > > Please open a JIRA issue with the description of the problem where we can > continue the discussion. > > Cheers, > Till > > On Thu, Oct 26, 2017 at 9:58 AM, Fabian Hueske wrote: > >> Hi Flavio, >> >> Thanks for bringing up this topic. >> I think running periodic jobs with state that gets restored and persisted >> in a savepoint is a very valid use case and would fit the stream is a >> superset of batch story quite well. >> I'm not sure if this behavior is already supported, but think this would >> be a desirable feature. >> >> I'm looping in Till and Aljoscha who might have some thoughts on this as >> well. >> Depending on the discussion we should open a JIRA for this feature. >> >> Cheers, Fabian >> >> 2017-10-25 10:31 GMT+02:00 Flavio Pompermaier : >> >>> Hi to all, >>> in my current use case I'd like to improve one step of our batch >>> pipeline. >>> There's one specific job that ingest a tabular dataset (of Rows) and >>> explode it into a set of RDF statements (as Tuples). The objects we output >>> are a containers of those Tuples (grouped by a field). >>> Flink stateful streaming could be a perfect fit here because we >>> incrementally increase the state of those containers but we don't have to >>> spend a lot of time performing some GET operation to an external Key-value >>> store. >>> The big problem here is that the sources are finite and the state of the >>> job gets lost once the job ends, while I was expecting that Flink was >>> snapshotting the state of its operators before exiting. >>> >>> This idea was inspired by https://data-artisans.com/b >>> log/queryable-state-use-case-demo#no-external-store, whit the >>> difference that one can resume the state of the stateful application only >>> when required. >>> Do you think that it could be possible to support such a use case (that >>> we can summarize as "periodic batch jobs that pick up where they left")? >>> >>> Best, >>> Flavio >>> >> >> > --001a11417034a4040d055c6fabcd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Done:=C2=A0https://issues.apache.org/jira/browse/FLINK-7930

=
Best,
Flavio

On Thu, Oct 26, 2017 at 10:52 AM, Till Rohrmann <= trohrmann@apache.org> wrote:
Hi Flavio,

this kind of feature is in= deed useful and currently not supported by Flink. I think, however, that th= is feature is a bit trickier to implement, because Tasks cannot currently i= nitiate checkpoints/savepoints on their own. This would entail some changes= to the lifecycle of a Task and an extra communication step with the JobMan= ager. However, nothing impossible to do.

Please op= en a JIRA issue with the description of the problem where we can continue t= he discussion.

Cheers,
Till
<= div class=3D"HOEnZb">

On Thu, Oct 26, 2017 at 9:58 AM, Fabian Hueske <fhue= ske@gmail.com> wrote:
Hi Flavio,

Tha= nks for bringing up this topic.
I think running periodic jobs with= state that gets restored and persisted in a savepoint is a very valid use = case and would fit the stream is a superset of batch story quite well.
<= /div>I'm not sure if this behavior is already supported, but think this= would be a desirable feature.

I'm looping in Till and Alj= oscha who might have some thoughts on this as well.
Depending on = the discussion we should open a JIRA for this feature.

Cheers, Fabian

2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pomp= ermaier@okkam.it>:
Hi to all,
in my current use case I'd like to improve one = step of our batch pipeline.
There's one specific job that ing= est a tabular dataset (of Rows) and explode it into a set of RDF statements= (as Tuples).=C2=A0 The objects we output are a containers of those Tuples = (grouped by a field).
Flink stateful streaming could be a perfect= fit here because we incrementally increase the state of those containers b= ut we don't have to spend a lot of time performing some GET operation t= o an external Key-value store.=C2=A0
The big problem here is that= the sources are finite and the state of the job gets lost once the job end= s, while I was expecting that Flink was snapshotting the state of its opera= tors before exiting.

This idea was inspired by=C2= =A0https://data-artisans.com/blog/qu= eryable-state-use-case-demo#no-external-store, whit the difference= that one can resume the state of the stateful application only when requir= ed.
Do you think that it could be possible to support such a use = case (that we can summarize as "periodic batch jobs that pick up where= they left")?

Best,
Flavio




--001a11417034a4040d055c6fabcd--