Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 3 May 2017 14:33:04 +0000 (UTC)
From: "Daryn Sharp (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13064497.1492464196000.115843.1493821984735@Atlassian.JIRA>
In-Reply-To: <JIRA.13064497.1492464196000@Atlassian.JIRA>
References: <JIRA.13064497.1492464196000@Atlassian.JIRA> <JIRA.13064497.1492464196478@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11661) GetContentSummary uses excessive
 amounts of memory
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 03 May 2017 14:33:11 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994982#comment-15994982 ] 

Daryn Sharp commented on HDFS-11661:
------------------------------------

Please hold off on commit until later this week.  There are more bugs related to snapshots and content summary and quota usage discrepencies.  I almost have a patch ready that optimizes content summary and appears to fix the snapshot issues.  

> GetContentSummary uses excessive amounts of memory
> --------------------------------------------------
>
>                 Key: HDFS-11661
>                 URL: https://issues.apache.org/jira/browse/HDFS-11661
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0, 3.0.0-alpha2
>            Reporter: Nathan Roberts
>            Assignee: Wei-Chiu Chuang
>            Priority: Blocker
>         Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap growth.png
>
>
> ContentSummaryComputationContext::nodeIncluded() is being used to keep track of all INodes visited during the current content summary calculation. This can be all of the INodes in the filesystem, making for a VERY large hash table. This simply won't work on large filesystems. 
> We noticed this after upgrading a namenode with ~100Million filesystem objects was spending significantly more time in GC. Fortunately this system had some memory breathing room, other clusters we have will not run with this additional demand on memory.
> This was added as part of HDFS-10797 as a way of keeping track of INodes that have already been accounted for - to avoid double counting.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org