flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: [DISCUSS] Flink and Ignite integration
Date Tue, 28 Apr 2015 22:55:28 GMT
Thanks Cos for starting this discussion, hi to the Ignite community!

The probably easiest and most straightforward integration of Flink and
Ignite would be to go through Ignite's IGFS. Flink can be easily extended
to support additional filesystems.

However, the Flink community is currently also looking for a solution to
checkpoint operator state of running stream processing programs. Flink
processes data streams in real time similar to Storm, i.e., it schedules
all operators of a streaming program and data is continuously flowing from
operator to operator. Instead of acknowledging each individual record,
Flink injects stream offset markers into the stream in regular intervals.
Whenever, an operator receives such a marker it checkpoints its current
state (currently to the master with some limitations). In case of a
failure, the stream is replayed (using a replayable source such as Kafka)
from the last checkpoint that was not received by all sink operators and
all operator states are reset to that checkpoint.
We had already looked at Ignite and were wondering whether Ignite could be
used to reliably persist the state of streaming operator.

The other points I mentioned on Twitter are just rough ideas at the moment.

Cheers, Fabian

2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:

> Thanks Cos.
> Hello Flink Community.
> From Ignite standpoint we definitely would be interested in providing Flink
> processing API on top of Ignite Data Grid or IGFS. It would be interesting
> to hear what steps would be required for such integration or if there are
> other integration points.
> D.
> On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik <cos@apache.org>
> wrote:
> > Following the lively exchange in Twitter (sic!) I would like to bring
> > together
> > Ignite and Flink communities to discuss the benefits of the integration
> and
> > see where we can start it.
> >
> > We have this recently opened ticket
> >   https://issues.apache.org/jira/browse/IGNITE-813
> >
> > and Fabian has listed the following points:
> >
> >  1) data store
> >  2) parameter server for ML models
> >  3) Checkpointing streaming op state
> >  4) continuously updating views from streams
> >
> > I'd add
> >  5) using Ignite IGFS to speed up Flink's access to HDFS data.
> >
> > I see a lot of interesting correlations between two projects and wonder
> if
> > Flink guys can step up with a few thoughts on where Flink can benefit the
> > most
> > from Ignite's in-memory fabric architecture? Perhaps, it can be used as
> > in-memory storage where the other components of the stack can quickly
> > access
> > and work w/ the data w/o a need to dump it back to slow storage?
> >
> > Thoughts?
> >   Cos
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message