kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewen Cheslack-Postava (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-2479) Add CopycatExceptions to indicate transient and permanent errors in a connector/task
Date Thu, 27 Aug 2015 17:09:46 GMT
Ewen Cheslack-Postava created KAFKA-2479:

             Summary: Add CopycatExceptions to indicate transient and permanent errors in
a connector/task
                 Key: KAFKA-2479
                 URL: https://issues.apache.org/jira/browse/KAFKA-2479
             Project: Kafka
          Issue Type: Sub-task
          Components: copycat
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava

Sometimes the connector will need to indicate to the framework that an error occurred, but
the error could have multiple responses by the framework.

For source connectors, there's not much they need to indicate since they can block indefinitely.
They probably only need to indicate permanent errors for correctness, though we may want them
to indicate transient errors so we can report health of the task in a metric.

For sink connectors, there are at least a couple of scenarios:
1. A task encounters some error while processing a {{put(records)}} call and was unable to
fully process it, but thinks it could be resolved in the future. The task doesn't want to
see any new records until the issue is resolved, but will need to see the same set of records
again. (It would be nice if the task doesn't have to deal with saving these to a buffer itself.)
2. A task encounters some error while processing data, but it has enqueued/handled the data
passed into the {{put(records)}} call. For example, it may have passed it to some library
which buffers it, but then the library indicated that it is having some connection issues.
The connector might be able accept more data, but the task is not in a healthy state.
3. The task encounters some error that it decides is unrecoverable. This might just be transient
errors that repeat for long enough that the task thinks its time to give up. Unclear what
to do here, but one option is relocating the task to another worker, hoping that the issue
is specific to the worker.

Note that it is not, generally, safe for sink tasks to do their own backoff or we'd potentially
starve the consumer, which needs to poll() in order to heartbeat. So we need to make sure
whatever mechanism we implement encourages the user to throw an exception and pass control
back to us instead.

This message was sent by Atlassian JIRA

View raw message