kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Spiros Ioannou (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption
Date Fri, 01 Sep 2017 08:32:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150200#comment-16150200
] 

Spiros Ioannou edited comment on KAFKA-5060 at 9/1/17 8:31 AM:
---------------------------------------------------------------

Well it seems we found the issue, we had systemd to stop kafka, and the default stop timeout
is 90 seconds. After 90 seconds systemd kills the process with SIGKILL. Raising the stop timeout
to 400 seconds stoped the production of such errors.   It seems kafka takes 3 minutes to shutdown
after the initial SIGTERM, mostly removing fetchers from partitions. (We have 3 kafka nodes,
replication 2, 1000 partitions * 4 topics.).
Here's the working systemd unit file for reference:


{noformat}
[Unit]
Description=Kafka
After=network.target

[Service]
Type=simple
Environment="KAFKA_OPTS=-XX:ParallelGCThreads=4"
Environment="JAVA_HOME=/opt/jdk8"

#Override environment:
EnvironmentFile=/etc/sysconfig/kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka-config-0.11/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
TimeoutStopSec=400
PIDFile=/run/kafka.pid
Restart=on-failure
RestartSec=10
LimitNOFILE=300000

[Install]
WantedBy=multi-user.target
{noformat}



was (Author: sivann):
Well it seems we found the issue, we had systemd to stop kafka, and the default stop timeout
is 90 seconds. After 90 seconds systemd kills the process with SIGKILL. Raising the stop timeout
to 400 seconds stoped the production of such errors.   It seems kafka takes 3 minutes to shutdown
after the initial SIGTERM, mostly removing fetchers from partitions. (We have 3 kafka nodes,
replication 2, 1000 partitions * 4 topics.).
Here's the working systemd for reference:


{noformat}
[Unit]
Description=Kafka
After=network.target

[Service]
Type=simple
Environment="KAFKA_OPTS=-XX:ParallelGCThreads=4"
Environment="JAVA_HOME=/opt/jdk8"

#Override environment:
EnvironmentFile=/etc/sysconfig/kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka-config-0.11/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
TimeoutStopSec=400
PIDFile=/run/kafka.pid
Restart=on-failure
RestartSec=10
LimitNOFILE=300000

[Install]
WantedBy=multi-user.target
{noformat}


> Offset not found while broker is rebuilding its index after an index corruption
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-5060
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5060
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.0
>            Reporter: Romaric Parmentier
>            Priority: Critical
>
> After rebooting our kafka servers to change a configuration, one of my consumers running
old consumer has fail to find a new leader for a period of 15 minutes. The topic has a replication
factor of 2.
> When the spare server has finally been found and elected leader, the previous consumed
offset was not able to be found because the broker was rebuilding index. 
> So my consumer has decided to follow the configuration auto.offset.reset which is pretty
bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement failed:
Corrupt index found, index file (/var/lib/kafka/my_topic-6/00000000130248110337.index) has
non-zero size but the last offset is 130248110337 which is no larger than the base offset
130248110337.}. deleting /var/lib/kafka/my_topic-6/00000000130248110337.timeindex, /var/lib/kafka/my_topic-6/00000000130248110337.index
and rebuilding index... (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log segments
and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to handle this
case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message