flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: State snapshotting when source is finite
Date Thu, 26 Oct 2017 09:19:13 GMT
Done: https://issues.apache.org/jira/browse/FLINK-7930

Best,
Flavio

On Thu, Oct 26, 2017 at 10:52 AM, Till Rohrmann <trohrmann@apache.org>
wrote:

> Hi Flavio,
>
> this kind of feature is indeed useful and currently not supported by
> Flink. I think, however, that this feature is a bit trickier to implement,
> because Tasks cannot currently initiate checkpoints/savepoints on their
> own. This would entail some changes to the lifecycle of a Task and an extra
> communication step with the JobManager. However, nothing impossible to do.
>
> Please open a JIRA issue with the description of the problem where we can
> continue the discussion.
>
> Cheers,
> Till
>
> On Thu, Oct 26, 2017 at 9:58 AM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi Flavio,
>>
>> Thanks for bringing up this topic.
>> I think running periodic jobs with state that gets restored and persisted
>> in a savepoint is a very valid use case and would fit the stream is a
>> superset of batch story quite well.
>> I'm not sure if this behavior is already supported, but think this would
>> be a desirable feature.
>>
>> I'm looping in Till and Aljoscha who might have some thoughts on this as
>> well.
>> Depending on the discussion we should open a JIRA for this feature.
>>
>> Cheers, Fabian
>>
>> 2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Hi to all,
>>> in my current use case I'd like to improve one step of our batch
>>> pipeline.
>>> There's one specific job that ingest a tabular dataset (of Rows) and
>>> explode it into a set of RDF statements (as Tuples).  The objects we output
>>> are a containers of those Tuples (grouped by a field).
>>> Flink stateful streaming could be a perfect fit here because we
>>> incrementally increase the state of those containers but we don't have to
>>> spend a lot of time performing some GET operation to an external Key-value
>>> store.
>>> The big problem here is that the sources are finite and the state of the
>>> job gets lost once the job ends, while I was expecting that Flink was
>>> snapshotting the state of its operators before exiting.
>>>
>>> This idea was inspired by https://data-artisans.com/b
>>> log/queryable-state-use-case-demo#no-external-store, whit the
>>> difference that one can resume the state of the stateful application only
>>> when required.
>>> Do you think that it could be possible to support such a use case (that
>>> we can summarize as "periodic batch jobs that pick up where they left")?
>>>
>>> Best,
>>> Flavio
>>>
>>
>>
>

Mime
View raw message