From hdfs-issues-return-8488-apmail-hadoop-hdfs-issues-archive=hadoop.apache.org@hadoop.apache.org Thu Apr 22 20:20:27 2010 Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 74654 invoked from network); 22 Apr 2010 20:20:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Apr 2010 20:20:27 -0000 Received: (qmail 92530 invoked by uid 500); 22 Apr 2010 20:20:27 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 92335 invoked by uid 500); 22 Apr 2010 20:20:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 92326 invoked by uid 99); 22 Apr 2010 20:20:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 20:20:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 20:20:23 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3MKK2sU019990 for ; Thu, 22 Apr 2010 20:20:02 GMT Message-ID: <28030455.147571271967602319.JavaMail.jira@thor> Date: Thu, 22 Apr 2010 16:20:02 -0400 (EDT) From: "Allen Wittenauer (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1104) Fsck triggers full GC on NameNode In-Reply-To: <7896447.125441271887974133.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859993#action_12859993 ] Allen Wittenauer commented on HDFS-1104: ---------------------------------------- #1: atime is important from an operations perspective for random usage file systems, such as tmp directories. Defaulting it off would make it counter to almost every file system that I can think of. [In fact, I can't think of -any- that default it off, but I'm sure there is one out there somewhere.] So a -1 on that idea. #2 was done to mirror what we saw with posix, when fsck specifically hits a file (since fsck mainly works on files, not blocks, like 'real' fsck). I'm slightly concerned about changing this functionality, as I could see it being used during debugging (the only time lots of files are accessed at all is during a nightly fsck). But I recognize this is an extreme edge case. #3 defeats the point of having atime at all. #4 that just seems like a good idea in general. why hold it in memory if it isn't getting used? +1 > Fsck triggers full GC on NameNode > --------------------------------- > > Key: HDFS-1104 > URL: https://issues.apache.org/jira/browse/HDFS-1104 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > > A NameNode at one of our clusters fell into full GC while fsck was performed. Digging into the problem shows that it is caused by how NameNode handles the access time of a file. > Fsck calls open on every file in the checked directory to get the file's block locations. Each open changes the file's access time and then leads to writing a transaction entry to the edit log. The current code optimizes open so that it returns without issuing synchronizing the edit log to the disk. It happened that in our cluster no other jobs were running while fsck was performed. No edit log sync was ever called. So all open transactions were kept in memory. When the edit log buffer got full, it automatically doubled its space by allocating a new buffer. Full GC happened when no contiguous space were found when allocating a new bigger buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.