Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2576318109 for ; Tue, 30 Jun 2015 12:24:20 +0000 (UTC) Received: (qmail 40745 invoked by uid 500); 30 Jun 2015 12:24:15 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 40662 invoked by uid 500); 30 Jun 2015 12:24:14 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 40652 invoked by uid 99); 30 Jun 2015 12:24:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 12:24:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gyula.fora@gmail.com designates 209.85.215.47 as permitted sender) Received: from [209.85.215.47] (HELO mail-la0-f47.google.com) (209.85.215.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 12:22:00 +0000 Received: by lagc2 with SMTP id c2so11026968lag.3 for ; Tue, 30 Jun 2015 05:23:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=rYFRoNnmmtm6ik9P+b00BkD2VlRNlW8vwfbZ9jCxB88=; b=E/+B8da6T+6olK6sHQ0yR8jCiq8pmbfdovnRfFEgnj6pq3ybZObn2Dt8U0GRR3h64Q ZpDE96FSZue2p2e0YRLjh4SBJbV+nxmlMC08vqBNE27NnrJqlqSKmL0whFQJyzZJu4qt EYE/e+nMeym9uhHM/SGc9MvugC2Ul8G0XX4Cfh+iRtpVK3RFBXB5JMdyKzkrAgJRN8kL 3qAdBfzol31z2XOs5WGt4XIE6E+IWmT+MIlwlHBKhsw1SslmiCyILoxPskk4chYpGsNP ImZpCM7qcZuv/N8GW80Fi5vTf4n2nmiQC+TmYifgOv+2kDOca7FkRQAKHwTskjHOpw74 pQJg== X-Received: by 10.152.44.193 with SMTP id g1mr19388139lam.15.1435667028381; Tue, 30 Jun 2015 05:23:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Gyula_F=C3=B3ra?= Date: Tue, 30 Jun 2015 12:23:38 +0000 Message-ID: Subject: Re: Apache Flink and serious streaming stateful processing To: user@flink.apache.org Content-Type: multipart/alternative; boundary=089e0160bda43e14260519bb4793 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160bda43e14260519bb4793 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Krzysztof, Thank you for your questions, we are happy to help you getting started. Regarding your questions: 1. There is backpressure for the streams, so if the downstream operators cannot keep up the sources will slow down. 2. We have support for stateful processing in Flink in many ways you have described in your question. Unfortunately the docs are down currently but you should check out the 'Stateful processing' section in the 0.10 docs (once its back online). We practically support an OperatorState interface which let's you keep partitioned state by some key and access it from runtime operators. The states declared using these interfaces are checkpointed and will be restored on failure. Currently all the states are stored in-memory but we are planning to extend it to allow writing state updates to external systems. I will send you some pointers once the docs are up again. Cheers, Gyula Krzysztof Zarzycki ezt =C3=ADrta (id=C5=91pont: 2015= . j=C3=BAn. 30., K, 14:07): > Greetings! > I'm extremely interested in Apache Flink, I think you're doing really a > great job! But please allow me to share two things that I would require > from Apache Flink to consider it as groundbreaking (it is what I need for > Streaming framework): > > 1. Stream backpressure. When stream processing part does not keep up, > please pause receiving new data. This is a serious problem in other > frameworks, like Spark Streaming. Please see the ticket in Spark about it= : > https://issues.apache.org/jira/browse/SPARK-7398 > 2. Support for (serious) stateful processing. What I mean by that is to b= e > able to keep state of the application in key-value stores, out-of-core, i= n > embedded mode. I want to be able to keep, let's say history of events fro= m > last two months, grouped & accessible by user_id, and don't want to use > external database for that (e.g. Cassandra). Communicating with external > database would kill my performance especially when *reprocessing* > historical data. And I definitely don't want to escape to batch processin= g > (like in Lambda Architecture). > > These two are the most important (IMHO) lacks in Spark Streaming and are > the reasons I'm not using it. These two are supported by Samza, which in > code and API is not excellent, but at least allows serious stream > processing, that does not require repeating the processing pipeline in > batch (Hadoop). > > I'm looking forward to seeing features like these in Flink. Or they are > already there and I'm just missing something? > > Thanks! > Krzysztof Zarzycki > --089e0160bda43e14260519bb4793 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi=C2=A0Krzysztof,

Tha= nk you for your questions, we are happy to help you getting started.=

Regarding your questions:

1. There is backpressure for the streams, so if = the downstream operators cannot keep up the sources will slow down.<= /div>

2. We have support for stateful processin= g in Flink in many ways you have described in your question. Unfortunately = the docs are down currently but you should check out the 'Stateful proc= essing' section in the 0.10 docs (once its back online). We practically= support an OperatorState interface which let's you keep partitioned st= ate by some key and access it from runtime operators. The states declared u= sing these interfaces are checkpointed and will be restored on failure. Cur= rently all the states are stored in-memory but we are planning to extend it= to allow writing state updates to external systems.

=
I will send you some pointers once the docs are up again= .

Cheers,
Gyula<= /span>

Krzysztof= Zarzycki <k.zarzycki@gmail.com<= /a>> ezt =C3=ADrta (id=C5=91pont: 2015. j=C3=BAn. 30., K, 14:07):
Greetings!
I'm extremely interested in Apache Flink, I think you're doi= ng really a great job! But please allow me to share two things that I would= require from Apache Flink to consider it as groundbreaking (it is what I n= eed for Streaming framework):=C2=A0

2. Support for (serious) stateful processing. What I mean by that i= s to be able to keep state of the application in key-value stores, out-of-c= ore, in embedded mode. I want to be able to keep, let's say history of = events from last two months, grouped & accessible by user_id, and don&#= 39;t want to use external database for that (e.g. Cassandra). Communicating= with external database would kill my performance especially when *reproces= sing* historical data. And I definitely don't want to escape to batch p= rocessing (like in Lambda Architecture).=C2=A0

= These two are the most important (IMHO) lacks in Spark Streaming and are th= e reasons I'm not using it. These two are supported by Samza, which in = code and API is not excellent, but at least allows serious stream processin= g, that does not require repeating the processing pipeline in batch (Hadoop= ).=C2=A0

I'm looking forward to seeing feat= ures like these in Flink. Or they are already there and I'm just missin= g something?

Thanks!
Krzysztof Zarzycki
--089e0160bda43e14260519bb4793--