Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1820818ABA for ; Mon, 16 Nov 2015 18:27:12 +0000 (UTC) Received: (qmail 20287 invoked by uid 500); 16 Nov 2015 18:27:11 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 20210 invoked by uid 500); 16 Nov 2015 18:27:11 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 20200 invoked by uid 99); 16 Nov 2015 18:27:11 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2015 18:27:11 +0000 Received: from [192.168.1.124] (x55b2af8d.dyn.telefonica.de [85.178.175.141]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 258551A006D for ; Mon, 16 Nov 2015 18:27:10 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: Error handling From: Aljoscha Krettek In-Reply-To: Date: Mon, 16 Nov 2015 19:27:08 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <93EAA3A7-B0CF-4137-86D6-E95C9B5CDBF3@apache.org> References: To: user@flink.apache.org X-Mailer: Apple Mail (2.3096.5) Hi, I don=E2=80=99t think that alleviates the problem. Sometimes you might = want the system to continue even if stuff outside the UDF fails. For = example, if a serializer does not work because of a null value = somewhere. You would, however, like to get a message about this = somewhere, I assume. Cheers, Aljoscha > On 16 Nov 2015, at 19:22, Stephan Ewen wrote: >=20 > Hi Nick! >=20 > The errors outside your UDF (such as network problems) will be handled = by Flink and cause the job to go into recovery. They should be = transparently handled. >=20 > Just make sure you activate checkpointing for your job! >=20 > Stephan >=20 >=20 > On Mon, Nov 16, 2015 at 6:18 PM, Nick Dimiduk = wrote: > I have been thinking about this, maybe we can add a special output = stream (for example Kafka, but can be generic) that would get = errors/exceptions that where throws during processing. The actual = processing would not stop and the messages in this special stream would = contain some information about the current state of processing, the = input element(s) and the machine/VM where computation is happening. >=20 > Yes, this is precisely what I have in mind. The goal is (1) to not = lose input data, and (2) to make errors available for operator = visibility. >=20 > It's not very portable, but I was able to implement my Maybe type. I can now use it as the output of all my source = streams, and split those streams on the presence of the Throwable. With = this, I'm able to trap certain forms of invalid input and send it to an = errors sink. However, there are still some error cases that cause = exceptions, apparently, outside of my UDF try block that cause the whole = streaming job to terminate. > =20 > > On 11 Nov 2015, at 21:49, Nick Dimiduk wrote: > > > > Heya, > > > > I don't see a section in the online manual dedicated to this topic, = so I want to raise the question here: How should errors be handled? = Specifically I'm thinking about streaming jobs, which are expected to = "never go down". For example, errors can be raised at the point where = objects are serialized to/from sources/sinks, and UDFs. Cascading = provides failure traps [0] where erroneous tuples are saved off for = post-processing. Is there any such functionality in Flink? > > > > I started down the road of implementing a Maybe/Optional type, a = POJO Generic triple of for capturing errors at each = stage of a pipeline. However, Java type erasure means even though it = compiles, the job is rejected at submission time. > > > > How are other people handling errors in their stream processing? > > > > Thanks, > > Nick > > > > [0]: = http://docs.cascading.org/cascading/1.2/userguide/html/ch06s03.html >=20 >=20 >=20