kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5829) Speedup broker startup after unclean shutdown by reducing unnecessary snapshot files deletion
Date Sun, 08 Oct 2017 11:58:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196079#comment-16196079
] 

ASF GitHub Bot commented on KAFKA-5829:
---------------------------------------

GitHub user ijuma opened a pull request:

    https://github.com/apache/kafka/pull/4040

    KAFKA-5829; Remove stray `printStackTrace()` in test

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijuma/kafka kafka-5829-follow-up

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/4040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4040
    
----
commit c82c2050127b6d1d53a05f7ad73b77a58b0af01e
Author: Ismael Juma <ismael@juma.me.uk>
Date:   2017-10-08T11:56:42Z

    Remove stray `printStackTrace()` in test

----


> Speedup broker startup after unclean shutdown by reducing unnecessary snapshot files
deletion
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-5829
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5829
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Dong Lin
>            Assignee: Ismael Juma
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> The current Kafka implementation will cause slow startup after unclean shutdown. The
time to load a partition will be 10X or more than what it actually needs. Here is the explanation
with example:
> - Say we have a partition of 20 segments, each segment has 250 message starting with
offset 0. And each message has 1 MB bytes.
> - Broker experiences hard kill and the index file of the first segment is corrupted.
> - When broker startup and load the first segment, it realizes that the index of the first
segment is corrupted. So it calls `log.recoverSegment(...)` to recover this segment. This
method will call `stateManager.truncateAndReload(...)` which deletes the snapshot files whose
offset is larger than base offset of the first segment. Thus all snapshot files are deleted.
> - To rebuild the snapshot files, the `log.loadSegmentFiles(...)` will have to read every
message in this partition even if their log and index files are not corrupted. This will increase
the time to load this partition by more than an order of magnitude.
> In order to address this issue, one simple solution is not to delete snapshot files that
are than the given offset if only the index files needs re-build. More specifically, we should
not need to re-build producer state offset file unless the log file itself is corrupted or
truncated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message