Return-Path: X-Original-To: apmail-ignite-dev-archive@minotaur.apache.org Delivered-To: apmail-ignite-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A73EC17E44 for ; Wed, 29 Apr 2015 10:22:42 +0000 (UTC) Received: (qmail 69566 invoked by uid 500); 29 Apr 2015 10:00:12 -0000 Delivered-To: apmail-ignite-dev-archive@ignite.apache.org Received: (qmail 43910 invoked by uid 500); 29 Apr 2015 09:59:56 -0000 Mailing-List: contact dev-help@ignite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.incubator.apache.org Delivered-To: mailing list dev@ignite.incubator.apache.org Received: (qmail 4303 invoked by uid 99); 29 Apr 2015 09:51:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 09:51:13 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=HTML_MESSAGE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of fhueske@gmail.com does not designate 54.191.145.13 as permitted sender) Received: from [54.191.145.13] (HELO mx1-us-west.apache.org) (54.191.145.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 09:51:07 +0000 Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 7E19820514 for ; Wed, 29 Apr 2015 09:50:46 +0000 (UTC) Received: by lbcga7 with SMTP id ga7so16005169lbc.1 for ; Wed, 29 Apr 2015 02:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=jKioYR+QzeUdhsqCkvO2kWSKLxwVUKUzLKfqR5XlAHg=; b=LagMPmrKV6LMZETwBGppmwGo43hdi8IFtMLSwJOauwVPPRcYUORrMtmZ8c/oEjrhMs 3hCMOGr8NuKfTjtQt5Mr+Fqgy32BAnPkClKY+h1V8LtTXKYMjHshdI5wv5SvKOCv4t3M FuqcANbUhEIIhv0U2H1f38NZRfb63NXDl5O8s2mcJNTMrjpdTrDqiygqi8pObgkl8BDJ TZ1Et+jJ3OKva4RT1ubHYH4r8W022MahQsX6sbX5FoINCYRrLRFyiF9r5qEN4FpTKVol lswdPbYaVG28TSid16QHFf+JDyH/pciCP0X7DMlK9JVWRO7rpjpzxDovF7TRye+qV66z AiZw== MIME-Version: 1.0 X-Received: by 10.152.37.228 with SMTP id b4mr12117180lak.117.1430300993926; Wed, 29 Apr 2015 02:49:53 -0700 (PDT) Received: by 10.152.225.171 with HTTP; Wed, 29 Apr 2015 02:49:53 -0700 (PDT) In-Reply-To: References: <20150426182051.GY28615@boudnik.org> <20150426221058.GF2826@tpx> <20150428215753.GK28615@boudnik.org> Date: Wed, 29 Apr 2015 11:49:53 +0200 Message-ID: Subject: Re: [DISCUSS] Flink and Ignite integration From: Fabian Hueske To: "dev@flink.apache.org" Cc: "dev@ignite.incubator.apache.org" Content-Type: multipart/alternative; boundary=089e014939fcaa3d850514d9e691 X-Virus-Checked: Checked by ClamAV on apache.org --089e014939fcaa3d850514d9e691 Content-Type: text/plain; charset=UTF-8 That's a good question... We are still in the design phase for this feature. Initially I would have said that replicated in-memory is what we want. However, Flink is aiming to support long running stream analytics (weeks, months, ...) and it would be bad if state collected over such a long time would be lost. So some kind of disk persistence would be good for certain use cases. 2015-04-29 1:28 GMT+02:00 Dmitriy Setrakyan : > On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske wrote: > > > Thanks Cos for starting this discussion, hi to the Ignite community! > > > > The probably easiest and most straightforward integration of Flink and > > Ignite would be to go through Ignite's IGFS. Flink can be easily extended > > to support additional filesystems. > > > > However, the Flink community is currently also looking for a solution to > > checkpoint operator state of running stream processing programs. Flink > > processes data streams in real time similar to Storm, i.e., it schedules > > all operators of a streaming program and data is continuously flowing > from > > operator to operator. Instead of acknowledging each individual record, > > Flink injects stream offset markers into the stream in regular intervals. > > Whenever, an operator receives such a marker it checkpoints its current > > state (currently to the master with some limitations). In case of a > > failure, the stream is replayed (using a replayable source such as Kafka) > > from the last checkpoint that was not received by all sink operators and > > all operator states are reset to that checkpoint. > > We had already looked at Ignite and were wondering whether Ignite could > be > > used to reliably persist the state of streaming operator. > > > > Fabian, do you need these checkpoints stored in memory (with optional > redundant copies, or course) or on disk? I think in-memory makes a lot more > sense from performance standpoint, and can easily be done in Ignite. > > > > > > The other points I mentioned on Twitter are just rough ideas at the > moment. > > > > Cheers, Fabian > > > > 2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan : > > > > > Thanks Cos. > > > > > > Hello Flink Community. > > > > > > From Ignite standpoint we definitely would be interested in providing > > Flink > > > processing API on top of Ignite Data Grid or IGFS. It would be > > interesting > > > to hear what steps would be required for such integration or if there > are > > > other integration points. > > > > > > D. > > > > > > On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik > > > wrote: > > > > > > > Following the lively exchange in Twitter (sic!) I would like to bring > > > > together > > > > Ignite and Flink communities to discuss the benefits of the > integration > > > and > > > > see where we can start it. > > > > > > > > We have this recently opened ticket > > > > https://issues.apache.org/jira/browse/IGNITE-813 > > > > > > > > and Fabian has listed the following points: > > > > > > > > 1) data store > > > > 2) parameter server for ML models > > > > 3) Checkpointing streaming op state > > > > 4) continuously updating views from streams > > > > > > > > I'd add > > > > 5) using Ignite IGFS to speed up Flink's access to HDFS data. > > > > > > > > I see a lot of interesting correlations between two projects and > wonder > > > if > > > > Flink guys can step up with a few thoughts on where Flink can benefit > > the > > > > most > > > > from Ignite's in-memory fabric architecture? Perhaps, it can be used > as > > > > in-memory storage where the other components of the stack can quickly > > > > access > > > > and work w/ the data w/o a need to dump it back to slow storage? > > > > > > > > Thoughts? > > > > Cos > > > > > > > > > > --089e014939fcaa3d850514d9e691--