Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14BA2FD21 for ; Mon, 28 Apr 2014 17:50:37 +0000 (UTC) Received: (qmail 72314 invoked by uid 500); 28 Apr 2014 17:50:19 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 72254 invoked by uid 500); 28 Apr 2014 17:50:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 72225 invoked by uid 99); 28 Apr 2014 17:50:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2014 17:50:17 +0000 Date: Mon, 28 Apr 2014 17:50:17 +0000 (UTC) From: "Marcelo Vanzin (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983288#comment-13983288 ] Marcelo Vanzin commented on HDFS-6293: -------------------------------------- Hi Kihwal, We have developed some code internally that mitigates (but does not eliminate) some of these problems. For an image with 140M entries it would need in the ballpark of 7-8GB of heap space, from my pencil-and-napkin calculations. Also, it does not generate entries in order like LsrPBImage does, and it's tailored for the use case of listing the contents of the file system (so it completely ignores things like snapshots). (The reason it still requires a lot of memory is, as you note, that it needs to load information about all inodes in memory; our code is just a little smarter about what information it loads. I don't think it's possible to make it much better without changing the data in the fsimage itself.) If people are ok with those limitations, we could clean up our code and post it as a patch. > Issues with OIV processing PB-based fsimages > -------------------------------------------- > > Key: HDFS-6293 > URL: https://issues.apache.org/jira/browse/HDFS-6293 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Kihwal Lee > Priority: Blocker > Attachments: Heap Histogram.html > > > There are issues with OIV when processing fsimages in protobuf. > Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. > Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)