kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Coen Damen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-6471) seekToEnd and seek give unclear results for Consumer with read_committed isolation level
Date Wed, 24 Jan 2018 07:41:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337075#comment-16337075
] 

Coen Damen edited comment on KAFKA-6471 at 1/24/18 7:40 AM:
------------------------------------------------------------

Hi Jason, thanks for your reply.

The Use Case is the following.

We are retrieving log files from a machine. These log records are transformed into Kafka messages.
Writing a log file into Kafka is atomic. In case of a read failure (of a file) or a write
failure (to Kafka), the transaction of writing messages to Kafka should be aborted an tried
again.

When tried again, or when idle for a longer time, during a restart or commencing of the "job"
we want to read where the processing was halted. e.g. the last successfully processed file.
For this I expected to use the seekToEnd with a Consumer that has the setting read_committed.
But, it moved to the end of the Topic, even after the Topic contained many aborted messages
at the end.

Note: the filename and the index within the file are part of the message, So we want to retrieve
the last successful message and extract the filename from it.

Thank you,

Coen

 


was (Author: coenos):
Hi Jason, thanks for your reply.

The Use Case is the following.

We are retrieving log files from a machine. These log records are transformed into Kafka messages.
Writing a log file into Kafka is atomic. In case of a read failure (of a file) or a write
failure (to Kafka), the transaction of writing messages to Kafka should be aborted an tried
again.

When tried again, or when idle for a longer time, during a restart or commencing of the "job"
we want to read where the processing was halted. e.g. the last successfully processed file.
For this I expected to use the seekToEnd with a Consumer that has the setting read_committed.
But, it moved to the end of the Topic, even after the Topic contained many aborted messages
at the end.

Thank you,

Coen

 

> seekToEnd and seek give unclear results for Consumer with read_committed isolation level
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6471
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6471
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Coen Damen
>            Priority: Major
>
> I am using the transactional KafkaProducer to send messages to a topic. This works fine.
I use a KafkaConsumer with read_committed isolation level and I have an issue with the seek
and seekToEnd methods. According to the documentation, the seek and seekToEnd methods give
me the LSO (Last Stable Offset). But this is a bit confusing. As it gives me always the same
value, the END of the topic. No matter if the last entry is committed (by the Producer) or
part of an aborted transaction. Example, after I abort the last 5 tries to insert 20_000 messages,
the last 100_000 records should not be read by the Consumer. But during a seekToEnd it moves
to the end of the Topic (including the 100_000 messages). But the poll() does not return them.
> I am looking for a way to retrieve the Last Committed Offset (so the last successful
committed message by the Producer). There seems to be no proper API method for this. So do
I need to roll my own?
> Option would be to move back and poll until no more records are retrieved, this would
result in the last committed message. But I would assume that Kafka provides this method.
> We use Kafka 1.0.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message