flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-5583) Support flexible error handling in the Kafka consumer
Date Tue, 07 Feb 2017 07:47:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855488#comment-15855488
] 

Tzu-Li (Gordon) Tai edited comment on FLINK-5583 at 2/7/17 7:47 AM:
--------------------------------------------------------------------

The collector can be made thread-safe yes, depending on the implementation. But for the  synchronization
to work properly, user implementations of {{deserialize(bytes, collector)}} will need to make
sure that all outputs are added to the collector before returning from the method.
 
Also note that for the underlying implementation I think we should have a separate collector
for each subscribed Kafka partition. A collector cannot be shared for multiple partitions.


was (Author: tzulitai):
The collector can be made thread-safe yes, depending on the implementation. But for the  synchronization
to work properly, user implementations of `deserialize()` will need to make sure that all
outputs are added to the collector before returning from the method.
 
Also note that for the underlying implementation I think we should have a separate collector
for each subscribed Kafka partition. A collector cannot be shared for multiple partitions.

> Support flexible error handling in the Kafka consumer
> -----------------------------------------------------
>
>                 Key: FLINK-5583
>                 URL: https://issues.apache.org/jira/browse/FLINK-5583
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>
> We found that it is valuable to allow the applications to handle errors and exceptions
in the Kafka consumer in order to build a robust application in production.
> The context is the following:
> (1) We have schematized, Avro records flowing through Kafka.
> (2) The decoder implements the DeserializationSchema to decode the records.
> (3) Occasionally there are corrupted records (e.g., schema issues). The streaming pipeline
might want to bail out (which is the current behavior) or to skip the corrupted records depending
on the applications.
> Two options are available:
> (1) Have a variant of DeserializationSchema to return a FlatMap like structure as suggested
in FLINK-3679.
> (2) Allow the applications to catch and handle the exception by exposing some APIs that
are similar to the {{ExceptionProxy}}.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message