Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D9FFE200C68 for ; Wed, 3 May 2017 16:33:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D8AC9160BB5; Wed, 3 May 2017 14:33:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2AC8A160BAA for ; Wed, 3 May 2017 16:33:10 +0200 (CEST) Received: (qmail 45511 invoked by uid 500); 3 May 2017 14:33:09 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 45500 invoked by uid 99); 3 May 2017 14:33:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2017 14:33:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E016E1A0330 for ; Wed, 3 May 2017 14:33:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id F61YJXsEM1ih for ; Wed, 3 May 2017 14:33:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 732C75FE37 for ; Wed, 3 May 2017 14:33:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C79C7E0DBA for ; Wed, 3 May 2017 14:33:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B4D3D21E09 for ; Wed, 3 May 2017 14:33:04 +0000 (UTC) Date: Wed, 3 May 2017 14:33:04 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11661) GetContentSummary uses excessive amounts of memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 03 May 2017 14:33:11 -0000 [ https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994982#comment-15994982 ] Daryn Sharp commented on HDFS-11661: ------------------------------------ Please hold off on commit until later this week. There are more bugs related to snapshots and content summary and quota usage discrepencies. I almost have a patch ready that optimizes content summary and appears to fix the snapshot issues. > GetContentSummary uses excessive amounts of memory > -------------------------------------------------- > > Key: HDFS-11661 > URL: https://issues.apache.org/jira/browse/HDFS-11661 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.8.0, 3.0.0-alpha2 > Reporter: Nathan Roberts > Assignee: Wei-Chiu Chuang > Priority: Blocker > Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap growth.png > > > ContentSummaryComputationContext::nodeIncluded() is being used to keep track of all INodes visited during the current content summary calculation. This can be all of the INodes in the filesystem, making for a VERY large hash table. This simply won't work on large filesystems. > We noticed this after upgrading a namenode with ~100Million filesystem objects was spending significantly more time in GC. Fortunately this system had some memory breathing room, other clusters we have will not run with this additional demand on memory. > This was added as part of HDFS-10797 as a way of keeping track of INodes that have already been accounted for - to avoid double counting. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org