kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4384) ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
Date Thu, 10 Nov 2016 07:01:09 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653249#comment-15653249

Jun He commented on KAFKA-4384:

Hi [~becket_qin], I have attached my patch including an unit test, please let me know if there
is anything else I should do. Thanks.

> ReplicaFetcherThread stopped after ReplicaFetcherThread received a corrupted message
> ------------------------------------------------------------------------------------
>                 Key: KAFKA-4384
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4384
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions:,,,
>         Environment: Ubuntu 12.04, AWS D2 instance
>            Reporter: Jun He
>             Fix For:
>         Attachments: KAFKA-4384.patch
> We recently discovered an issue in Kafka (), where ReplicaFetcherThread stopped
after ReplicaFetcherThread received a corrupted message. As the same logic exists also in
Kafka and, they may have the similar issue.
> Here are system logs related to this issue.
> 2016-10-30 - 02:33:05,606 ERROR ReplicaFetcherThread-5-1590174474 ReplicaFetcherThread.apply
- Found invalid messages during fetch for partition [logs,41] offset 39021512238 error Message
is corrupt (stored crc = 2028421553, computed crc = 3577227678)
> 2016-10-30 - 02:33:06,582 ERROR ReplicaFetcherThread-5-1590174474 ReplicaFetcherThread.error
- [ReplicaFetcherThread-5-1590174474], Error due to kafka.common.KafkaException: - error processing
data for partition [logs,41] offset 39021512301 Caused - by: java.lang.RuntimeException: Offset
mismatch: fetched offset = 39021512301, log end offset = 39021512238.
> First, ReplicaFetcherThread got a corrupted message (offset 39021512238) due to some
> Line https://github.com/apache/kafka/blob/
threw exception
> Then, Line https://github.com/apache/kafka/blob/
caught it and logged this error.
> Because https://github.com/apache/kafka/blob/
updated the topic partition offset to the fetched latest one in partitionMap. So ReplicaFetcherThread
skipped the batch with corrupted messages. 
> Based on https://github.com/apache/kafka/blob/,
the ReplicaFetcherThread then directly fetched the next batch of messages (with offset 39021512301)
> Next, ReplicaFetcherThread stopped because the log end offset (still 39021512238) didn't
match the fetched message (offset 39021512301).
> A quick fix is to move line 134 to be after line 138.
> Would be great to have your comments and please let me know if a Jira issue is needed.

This message was sent by Atlassian JIRA

View raw message