kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException
Date Mon, 19 Jun 2017 20:31:02 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054686#comment-16054686
] 

Jun Rao commented on KAFKA-5431:
--------------------------------

[~crietz], in Log.roll(), we call LogSegment.trim() to reset the size of the log file to the
actual size, which eventually calls FileRecords.truncateTo(). If this is reproducible, could
you add some instrumentation in FileRecords.truncateTo() to see if the logic is actually called
during log rolling?

> LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-5431
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5431
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.2.1
>            Reporter: Carsten Rietz
>              Labels: reliability
>             Fix For: 0.11.0.1
>
>
> Hey all,
> i have a strange problem with our uat cluster of 3 kafka brokers.
> the __consumer_offsets topic was replicated to two instances and our disks ran full due
to a wrong configuration of the log cleaner. We fixed the configuration and updated from 0.10.1.1
to 0.10.2.1 .
> Today i increased the replication of the __consumer_offsets topic to 3 and triggered
replication to the third cluster via kafka-reassign-partitions.sh. 
> That went well but i get many errors like
> {code}
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,18]
offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for partition [__consumer_offsets,24]
offset 0 error Record size is less than the minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> {code}
> Which i think are due to the full disk event.
> The log cleaner threads died on these wrong messages:
> {code}
> [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is less than the minimum
record overhead (14)
> [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {code}
> Looking at the file is see that some are truncated and some are jsut empty:
> $ ls -lsh 00000000000000594653.log
> 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00000000000000594653.log
> Sadly i do not have the logs any more from the disk full event itsself.
> I have three questions:
> * What is the best way to clean this up? Deleting the old log files and restarting the
brokers?
> * Why did kafka not handle the disk full event well? Is this only affecting the cleanup
or may we also loose data?
> * Is this maybe caused by the combination of upgrade and disk full?
> And last but not least: Keep up the good work. Kafka is really performing well while
being easy to administer and has good documentation!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message