kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Schnitter (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-7130) EOFException after rolling log segment
Date Tue, 03 Jul 2018 10:58:00 GMT
Karsten Schnitter created KAFKA-7130:
----------------------------------------

             Summary: EOFException after rolling log segment
                 Key: KAFKA-7130
                 URL: https://issues.apache.org/jira/browse/KAFKA-7130
             Project: Kafka
          Issue Type: Bug
          Components: replication
    Affects Versions: 1.1.0
            Reporter: Karsten Schnitter


When rolling a log segment one of our Kafka cluster got an immediate read error on the same
partition. This lead to a flood of log messages containing the corresponding stacktraces.
Data was still appended to the partition but consumers were unable to read from that partition.
Reason for the exception is unclear.

{noformat}
[2018-07-02 23:53:32,732] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled
new log segment at offset 971865991 in 1 ms. (kafka.log.Log)
[2018-07-02 23:53:32,739] INFO [ProducerStateManager partition=ingestion-3] Writing producer
snapshot at offset 971865991 (kafka.log.ProducerStateManager)
[2018-07-02 23:53:32,739] INFO [Log partition=ingestion-3, dir=/var/vcap/store/kafka] Rolled
new log segment at offset 971865991 in 1 ms. (kafka.log.Log)
[2018-07-02 23:53:32,750] ERROR [ReplicaManager broker=1] Error processing fetch operation
on partition ingestion-3, offset 971865977 (kafka.server.ReplicaManager)

Caused by: java.io.EOFException: Failed to read `log header` from file channel `sun.nio.ch.FileChannelImpl@2e0e8810`.
Expected to read 17 bytes, but reached end of file after reading 0 bytes. Started read from
position 2147483643.
{noformat}

We mitigated the issue by stopping the affected node and deleting the corresponding directory.
Once the partition was recreated for the replica (we use replication-factor 2) the other replica
experienced the same problem. We mitigated likewise.

To us it is unclear, what caused this issue. Can you help us in finding the root cause of
this problem?
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message