kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewen Cheslack-Postava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6490) JSON SerializationException Stops Connect
Date Wed, 07 Feb 2018 04:15:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354945#comment-16354945

Ewen Cheslack-Postava commented on KAFKA-6490:

A change in behavior like that would definitely require a KIP – existing users would not
expect this at all.

Connect started with the current behavior because for many users losing data is worse than
suffering some downtime. However, it's clear some alternatives are warranted; this question
comes up from time to time on mailing lists. Generally there are only a few options that seem
to make sense:
 * Stop processing (current behavior) and log
 * Log and retry (really only makes sense for unusual edge cases where data got corrupted
in flight between Kafka and Connect)
 * Discard and log (I care about uptime more than a bit of lost data)
 * Dead letter queue (or some other fallback handler)

The retry case is probably the least important here as it will rarely make a difference, so
the other 3 are the ones I think we'd want to implement. A KIP for this should be straightforward,
though the implementation will require care to make sure we handle all places errors can occur
(in the producer/consumer, during deserialization, during transformations, etc).

> JSON SerializationException Stops Connect
> -----------------------------------------
>                 Key: KAFKA-6490
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6490
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 1.0.0
>            Reporter: William R. Speirs
>            Priority: Major
>         Attachments: KAFKA-6490_v1.patch
> If you configure KafkaConnect to parse JSON messages, and you send it a non-JSON message,
the SerializationException message will bubble up to the top, and stop KafkaConnect. While
I understand sending non-JSON to a JSON serializer is a bad idea, I think that a single malformed
message stopping all of KafkaConnect is even worse.
> The data exception is thrown here: [https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L305]
> From the call here: [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L476]
> This bubbles all the way up to the top, and KafkaConnect simply stops with the message:
{{ERROR WorkerSinkTask\{id=elasticsearch-sink-0} Task threw an uncaught and unrecoverable
exception (org.apache.kafka.connect.runtime.WorkerTask:172)}}
> Thoughts on adding a {{try/catch}} around the {{for}} loop in WorkerSinkTask's {{convertMessages}}
so messages that don't properly parse are logged, but simply ignored? This way KafkaConnect
can keep working even when it encounters a message it cannot decode?

This message was sent by Atlassian JIRA

View raw message