flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Error handling
Date Mon, 16 Nov 2015 18:27:08 GMT
Hi,
I don’t think that alleviates the problem. Sometimes you might want the system to continue
even if stuff outside the UDF fails. For example, if a serializer does not work because of
a null value somewhere. You would, however, like to get a message about this somewhere, I
assume.

Cheers,
Aljoscha
> On 16 Nov 2015, at 19:22, Stephan Ewen <sewen@apache.org> wrote:
> 
> Hi Nick!
> 
> The errors outside your UDF (such as network problems) will be handled by Flink and cause
the job to go into recovery. They should be transparently handled.
> 
> Just make sure you activate checkpointing for your job!
> 
> Stephan
> 
> 
> On Mon, Nov 16, 2015 at 6:18 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> I have been thinking about this, maybe we can add a special output stream (for example
Kafka, but can be generic) that would get errors/exceptions that where throws during processing.
The actual processing would not stop and the messages in this special stream would contain
some information about the current state of processing, the input element(s) and the machine/VM
where computation is happening.
> 
> Yes, this is precisely what I have in mind. The goal is (1) to not lose input data, and
(2) to make errors available for operator visibility.
> 
> It's not very portable, but I was able to implement my Maybe<IN, OUT, Throwable>
type. I can now use it as the output of all my source streams, and split those streams on
the presence of the Throwable. With this, I'm able to trap certain forms of invalid input
and send it to an errors sink. However, there are still some error cases that cause exceptions,
apparently, outside of my UDF try block that cause the whole streaming job to terminate.
>  
> > On 11 Nov 2015, at 21:49, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> >
> > Heya,
> >
> > I don't see a section in the online manual dedicated to this topic, so I want to
raise the question here: How should errors be handled? Specifically I'm thinking about streaming
jobs, which are expected to "never go down". For example, errors can be raised at the point
where objects are serialized to/from sources/sinks, and UDFs. Cascading provides failure traps
[0] where erroneous tuples are saved off for post-processing. Is there any such functionality
in Flink?
> >
> > I started down the road of implementing a Maybe/Optional type, a POJO Generic triple
of <IN, OUT, Throwable> for capturing errors at each stage of a pipeline. However, Java
type erasure means even though it compiles, the job is rejected at submission time.
> >
> > How are other people handling errors in their stream processing?
> >
> > Thanks,
> > Nick
> >
> > [0]: http://docs.cascading.org/cascading/1.2/userguide/html/ch06s03.html
> 
> 
> 


Mime
View raw message