kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Hauch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4107) Support offset reset capability in Kafka Connect
Date Fri, 08 Sep 2017 22:33:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159461#comment-16159461
] 

Randall Hauch commented on KAFKA-4107:
--------------------------------------

Not sure if anyone had any thoughts on how this might work, but the challenge is that source
connectors can define partitions and offsets as maps with any key/value pairs. Yes, we could
make a fairly complex tool that could read and apply some transformation to an existing offset,
but would it be sufficient to have a simpler tool that could:

* output the array of existing partitions-offsets pairs as JSON (to standard out or to a file?)
* read (from standard in or a file?) a JSON document with an array of partitions & offset
pairs that should be written as-is to the offsets topic. A partition-offset pair with a null
offset doc could be used to "remove" the existing offset.

Main options:
* --group (required): the group identifier of the worker cluster
* --bootstrap-server (required): the address of the initial brokers to connect
* --topic (required): the name of the offset topic

Export options:
* --export (required): used to specify that the parition-offset pairs are to be read from
the topic and exported to a JSON document/array
* --to-file (optional): the name of the file where the JSON document/array is to be written;
if not provided, it would be written to standard output.

Update options:
* --update (required): used to specify that the specified partition-offset pairs are to be
written to the specified topic.
* --from-file (optional): the name of the file where the JSON document/array is to be read;
if not provided, it would be read from standard input.
* --dry-run (optional): used to signal that the tool should output what it would change, but
should not actually change anything

For example, the following would export the current source partition-offset pairs:
{code}
bin/kafka-connect-source-offset-reset.sh --export --group=my-group --boostrap-server=localhost:9092
--topic=offset-topic --to-file=my-offsets.json
{code}

The user can then edit the file as needed, including changing to null any of the offset doc
values that are to be removed. To apply the changes, the user would then run the following
command to read in the file and update source partition-offset pairs in the topic:
{code}
bin/kafka-connect-source-offset-reset.sh --update --group=my-group --boostrap-server=localhost:9092
--topic=offset-topic --from-file=my-offsets.json
{code}

This tool would only work if the messages in the Kafka Connect offset topic were serialized
with the JSON converter (corresponding to the `internal.key.converter` and `internal.value.converter`).

> Support offset reset capability in Kafka Connect
> ------------------------------------------------
>
>                 Key: KAFKA-4107
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4107
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Jason Gustafson
>
> It would be useful in some cases to be able to reset connector offsets. For example,
if a topic in Kafka corresponding to a source database is accidentally deleted (or deleted
because of corrupt data), an administrator may want to reset offsets and reproduce the log
from the beginning. It may also be useful to have support for overriding offsets, but that
seems like a less likely use case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message