Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52126109AD for ; Thu, 15 Jan 2015 16:14:33 +0000 (UTC) Received: (qmail 8292 invoked by uid 500); 15 Jan 2015 16:14:34 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 8240 invoked by uid 500); 15 Jan 2015 16:14:34 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8228 invoked by uid 99); 15 Jan 2015 16:14:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2015 16:14:34 +0000 Date: Thu, 15 Jan 2015 16:14:34 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7609) startup used too much time to load edits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278877#comment-14278877 ] Daryn Sharp commented on HDFS-7609: ----------------------------------- I think there be a separate option to disable populating the cache on startup. If someone has a NN that can restart reasonably fast before clients give up, it crashes due to corrupt edits, they restart in recovery, they probably would like clients to recover. [~sureshms], thoughts? > startup used too much time to load edits > ---------------------------------------- > > Key: HDFS-7609 > URL: https://issues.apache.org/jira/browse/HDFS-7609 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.2.0 > Reporter: Carrey Zhan > Attachments: recovery_do_not_use_retrycache.patch > > > One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) > I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. > I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. > I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)