Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 70480 invoked from network); 2 May 2008 03:06:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 May 2008 03:06:46 -0000 Received: (qmail 63359 invoked by uid 500); 2 May 2008 03:06:47 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 63329 invoked by uid 500); 2 May 2008 03:06:47 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 63318 invoked by uid 99); 2 May 2008 03:06:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 May 2008 20:06:47 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2008 03:06:01 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BF6C6234C114 for ; Thu, 1 May 2008 20:02:55 -0700 (PDT) Message-ID: <300796534.1209697375780.JavaMail.jira@brutus> Date: Thu, 1 May 2008 20:02:55 -0700 (PDT) From: "Konstantin Shvachko (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3248) Improve Namenode startup performance In-Reply-To: <1129580591.1208196184949.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593705#action_12593705 ] Konstantin Shvachko commented on HADOOP-3248: --------------------------------------------- - INode.getLocalNameBytes() vs .getLocalNameInBytes() - byteStore is of size 512K. Our file names are liminted by MAX_PATH_LENGTH = 8000, utf8 is not more than 4 bytes per character, so you don't need a store larger than 32K bytes, but I would use even smaller buffer - it is ok if the buffer is reallocated if we have really long names. - byteStore should be local variable as long as strBuf is a static member. - all static members "used for saving the image to disk" like fileperm and strBuf should be extremely private. - if you introduce static PermissionStatus.write(out,u,g,p) then you should call it in PermissionStatus.write(out) otherwise we have two ways to serialize PermissionStatus object. Everything else looks great. > Improve Namenode startup performance > ------------------------------------ > > Key: HADOOP-3248 > URL: https://issues.apache.org/jira/browse/HADOOP-3248 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: girish vaitheeswaran > Assignee: dhruba borthakur > Attachments: fastRestarts.patch, fastRestarts.patch, fastRestarts2.patch, FSImage.patch > > > One of the things that would need to be addressed as part of Namenode scalability is the HDFS recovery performance especially in scenarios where the number of files is large. There are instances where the number of files are in the vicinity of 20 million and in such cases the time taken for namenode startup is prohibitive. Here are some benchmark numbers on the time taken for namenode startup. These times do not include the time to process block reports. > Default scenario for 20 million files with the max java heap size set to 14GB : 40 minutes > Tuning various java options such as young size, parallel garbage collection, initial java heap size : 14 minutes > As can be seen, 14 minutes is still a long time for the namenode to recover and code changes are required to bring this time down further. To this end some prototype optimizations were done to reduce this time. Based on some timing analysis saveImage and loadFSImage where the primary methods that were consuming most of the time. Most of the time was being spent on doing object allocations. The goal of the optimizations is to reduce the number of memory allocations as much as possible. > Optimization 1: saveImage() > ====================== > Avoid allocation of the UTF8 object. > Old code > ======= > new UTF8(fullName).write(out); > New Code > ======== > out.writeUTF(fullName) > Optimization 2: saveImage() > ====================== > Avoid object allocation of the PermissionStatus Object and the FsPermission object. This is to be done for Directories and for files. > Old code > ======= > fileINode.getPermissionStatus().write(out) > New Code > ========= > out.writeBytes(fileINode.getUserName()) > out.writeBytes(fileINode.getGroupName()) > out.writeShort(fileINode.getFsPermission().toShort()) > Optimization 3 > ============ > loadImage() could use the same mechanism where we would avoid allocating the PermissionStatus object and the FsPermission object. > Optimization 4 > ============ > A hack was tried out to avoid the cost of object allocation from saveImage() where the fullName was being constructed using string concatenation. This optimization also helped improve performance > Overall these optimizations helped bring down the overall startup time down to slightly over 7 minutes. Most of all the remaining time is now spent in loadFSImage() since we allocate the INode and INodeDirectory objects. Any further optimizations will need to focus on loadFSImage() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.