kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kuznetsov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4794) Add access to OffsetStorageReader from SourceConnector
Date Fri, 06 Oct 2017 16:49:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194836#comment-16194836

Oleg Kuznetsov commented on KAFKA-4794:

You mentioned that connector cannot read offsets on reconfiguration - do you mean it cannot
do it due to current implementation? So if implementation changes, it can do it?

> Add access to OffsetStorageReader from SourceConnector
> ------------------------------------------------------
>                 Key: KAFKA-4794
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4794
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions:
>            Reporter: Florian Hussonnois
>            Priority: Minor
>              Labels: needs-kip
>             Fix For: 1.1.0
> Currently the offsets storage is only accessible from SourceTask to able to initialize
properly tasks after a restart, a crash or a reconfiguration request.
> To implement more complex connectors that need to track the progression of each task
it would helpful to have access to an OffsetStorageReader instance from the SourceConnector.
> In that way, we could have a background thread that could request a tasks reconfiguration
based on source offsets.
> This improvement proposal comes from a customer project that needs to periodically scan
directories on a shared storage for detecting and for streaming new files into Kafka.
> The connector implementation is pretty straightforward.
> The connector uses a background thread to periodically scan directories. When new inputs
files are detected a tasks reconfiguration is requested. Then the connector assigns a file
subset to each task. 
> Each task stores sources offsets for the last sent record. The source offsets data are:
>  - the size of file
>  - the bytes offset
>  - the bytes size 
> Tasks become idle when the assigned files are completed (in : recordBytesOffsets + recordBytesSize
= fileBytesSize).
> Then, the connector should be able to track offsets for each assigned file. 
> When all tasks has finished the connector can stop them or assigned new files by requesting
tasks reconfiguration. 
> Moreover, another advantage of monitoring source offsets from the connector is detect
slow or failed tasks and if necessary to be able to restart all tasks.
> If you think this improvement is OK, I can work a pull request.
> Thanks,

This message was sent by Atlassian JIRA

View raw message