Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2B465200D17 for ; Sun, 8 Oct 2017 13:58:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 29AC11609E6; Sun, 8 Oct 2017 11:58:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6E9031609CB for ; Sun, 8 Oct 2017 13:58:11 +0200 (CEST) Received: (qmail 45210 invoked by uid 500); 8 Oct 2017 11:58:10 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 45199 invoked by uid 99); 8 Oct 2017 11:58:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Oct 2017 11:58:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 33D8DD467A for ; Sun, 8 Oct 2017 11:58:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id OfjXzVm5cGxS for ; Sun, 8 Oct 2017 11:58:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3C5DB5F3FE for ; Sun, 8 Oct 2017 11:58:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 64679E04F4 for ; Sun, 8 Oct 2017 11:58:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2E4502417D for ; Sun, 8 Oct 2017 11:58:00 +0000 (UTC) Date: Sun, 8 Oct 2017 11:58:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-5829) Speedup broker startup after unclean shutdown by reducing unnecessary snapshot files deletion MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 08 Oct 2017 11:58:12 -0000 [ https://issues.apache.org/jira/browse/KAFKA-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196079#comment-16196079 ] ASF GitHub Bot commented on KAFKA-5829: --------------------------------------- GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/4040 KAFKA-5829; Remove stray `printStackTrace()` in test You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-5829-follow-up Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/4040.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4040 ---- commit c82c2050127b6d1d53a05f7ad73b77a58b0af01e Author: Ismael Juma Date: 2017-10-08T11:56:42Z Remove stray `printStackTrace()` in test ---- > Speedup broker startup after unclean shutdown by reducing unnecessary snapshot files deletion > --------------------------------------------------------------------------------------------- > > Key: KAFKA-5829 > URL: https://issues.apache.org/jira/browse/KAFKA-5829 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Ismael Juma > Priority: Blocker > Fix For: 1.0.0 > > > The current Kafka implementation will cause slow startup after unclean shutdown. The time to load a partition will be 10X or more than what it actually needs. Here is the explanation with example: > - Say we have a partition of 20 segments, each segment has 250 message starting with offset 0. And each message has 1 MB bytes. > - Broker experiences hard kill and the index file of the first segment is corrupted. > - When broker startup and load the first segment, it realizes that the index of the first segment is corrupted. So it calls `log.recoverSegment(...)` to recover this segment. This method will call `stateManager.truncateAndReload(...)` which deletes the snapshot files whose offset is larger than base offset of the first segment. Thus all snapshot files are deleted. > - To rebuild the snapshot files, the `log.loadSegmentFiles(...)` will have to read every message in this partition even if their log and index files are not corrupted. This will increase the time to load this partition by more than an order of magnitude. > In order to address this issue, one simple solution is not to delete snapshot files that are than the given offset if only the index files needs re-build. More specifically, we should not need to re-build producer state offset file unless the log file itself is corrupted or truncated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)