Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9245210EE4 for ; Thu, 6 Jun 2013 14:23:28 +0000 (UTC) Received: (qmail 57216 invoked by uid 500); 6 Jun 2013 14:23:27 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 56990 invoked by uid 500); 6 Jun 2013 14:23:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 56538 invoked by uid 99); 6 Jun 2013 14:23:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2013 14:23:23 +0000 Date: Thu, 6 Jun 2013 14:23:22 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4879) Add "blocked ArrayList" collection to avoid CMS full GCs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677056#comment-13677056 ] Daryn Sharp commented on HDFS-4879: ----------------------------------- I think this is a great change, but agree that ChunkedArrayList should ideally be a full-fledged list. We may find this list implementation to be useful in other places, which is a benefit over using an actual linked list. Comments/suggestions: * Default ctor should invoke the ctor with capacity/size to avoid code duplication. * Consider avoiding need to compute size by tracking it via add/remove? This would simplify isEmpty() to size == 0. * Consider removing multiple calls to addChunk() to seed the main list by folding the logic into add? It could add a new chunk if the list is either empty, or the existing full chunk logic. * Why does each additional chunk's capacity quadruple? If necessary, it would be more understandable to multiple by 4. > Add "blocked ArrayList" collection to avoid CMS full GCs > -------------------------------------------------------- > > Key: HDFS-4879 > URL: https://issues.apache.org/jira/browse/HDFS-4879 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.0.0, 2.0.4-alpha > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-4879.txt, hdfs-4879.txt > > > We recently saw an issue where a large deletion was issued which caused 25M blocks to be collected during {{deleteInternal}}. Currently, the list of collected blocks is an ArrayList, meaning that we had to allocate a contiguous 25M-entry array (~400MB). After a NN has been running for a long amount of time, the old generation may become fragmented such that it's hard to find a 400MB contiguous chunk of heap. > In general, we should try to design the NN such that the only large objects are long-lived and created at startup time. We can improve this particular case (and perhaps some others) by introducing a new List implementation which is made of a linked list of arrays, each of which is size-limited (eg to 1MB). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira