Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10E3B18FB9 for ; Thu, 1 Oct 2015 01:34:06 +0000 (UTC) Received: (qmail 79123 invoked by uid 500); 1 Oct 2015 01:34:05 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 79071 invoked by uid 500); 1 Oct 2015 01:34:05 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 79058 invoked by uid 99); 1 Oct 2015 01:34:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2015 01:34:05 +0000 Date: Thu, 1 Oct 2015 01:34:05 +0000 (UTC) From: "Yi Liu (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-9053) Support large directories efficiently using B-Tree MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936183#comment-14936183 ] Yi Liu edited comment on HDFS-9053 at 10/1/15 1:33 AM: ------------------------------------------------------- Thanks [~szetszwo] for the comments. {quote} here are quite a few errors in description and the analysis. Let's hold on the patch for the moment and fix all these errors first. {quote} Sure, let's fix the errors in the description and the analysis first, thanks for pointing out this. {quote} > So totally 12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 40 bytes on 64-bits system/JVM. (I have not counted object alignment) It seems that there is at least one more reference not yet counted since Node is non-static. {quote} I mean the *increased memory* compared with using ArrayList, and I have stated this in above comments. The reference to Node is already counted, use 64-bits system/JVM as example, the first '8' is the reference to the B-Tree root. The second '8' is the null reference for children in BTree#Node. This doesn't count the memory parts which ArrayList also has: 16 bytes object overhead + 4 bytes int modCount + 8 bytes reference to elements + 4 bytes int elementsSize + elements array memory. Nicholas, I am going to fix the two places in the JIRA description and analysis in JIRA comments, which we find so far # fix the time complexity of insert/delete for ArrayList # fix the description about memory usage for B-Tree to add the system/JVM information and add more detailed information. Any other errors do you find besides these? please tell me and appreciate if you can also help to edit them directly. was (Author: hitliuyi): Thanks [~szetszwo] for the comments. {quote} here are quite a few errors in description and the analysis. Let's hold on the patch for the moment and fix all these errors first. {quote} Sure, let's fix the errors in the description and the analysis first, thanks for pointing out this. {quote} > So totally 12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 40 bytes on 64-bits system/JVM. (I have not counted object alignment) It seems that there is at least one more reference not yet counted since Node is non-static. {quote} I mean the *increased memory* compared with using ArrayList, and I have stated this in above comments. The reference to Node is already counted, use 64-bits system/JVM as example, the first '8' is the reference to the B-Tree root. The second '8' is the null reference for children in BTree#Node. This doesn't count the memory parts which ArrayList also has: 12 bytes object overhead + 4 bytes int modCount + 8 bytes reference to elements + 4 bytes int elementsSize + elements array memory. Nicholas, I am going to fix the two places in the JIRA description and analysis in JIRA comments, which we find so far # fix the time complexity of insert/delete for ArrayList # fix the description about memory usage for B-Tree to add the system/JVM information and add more detailed information. Any other errors do you find besides these? please tell me and appreciate if you can also help to edit them directly. > Support large directories efficiently using B-Tree > -------------------------------------------------- > > Key: HDFS-9053 > URL: https://issues.apache.org/jira/browse/HDFS-9053 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Yi Liu > Assignee: Yi Liu > Priority: Critical > Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch, HDFS-9053.003.patch > > > This is a long standing issue, we were trying to improve this in the past. Currently we use an ArrayList for the children under a directory, and the children are ordered in the list, for insert/delete, the time complexity is O\(n), (the search is O(log n), but insertion/deleting causes re-allocations and copies of arrays), for large directory, the operations are expensive. If the children grow to 1M size, the ArrayList will resize to > 1M capacity, so need > 1M * 8bytes = 8M (the reference size is 8 for 64-bits system/JVM) continuous heap memory, it easily causes full GC in HDFS cluster where namenode heap memory is already highly used. I recap the 3 main issues: > # Insertion/deletion operations in large directories are expensive because re-allocations and copies of big arrays. > # Dynamically allocate several MB continuous heap memory which will be long-lived can easily cause full GC problem. > # Even most children are removed later, but the directory INode still occupies same size heap memory, since the ArrayList will never shrink. > This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to solve the problem suggested by [~shv]. > So the target of this JIRA is to implement a low memory footprint B-Tree and use it to replace ArrayList. > If the elements size is not large (less than the maximum degree of B-Tree node), the B-Tree only has one root node which contains an array for the elements. And if the size grows large enough, it will split automatically, and if elements are removed, then B-Tree nodes can merge automatically (see more: https://en.wikipedia.org/wiki/B-tree). It will solve the above 3 issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)