Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 45269 invoked from network); 24 Jun 2010 01:44:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 01:44:12 -0000 Received: (qmail 6751 invoked by uid 500); 24 Jun 2010 01:44:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 6677 invoked by uid 500); 24 Jun 2010 01:44:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 6669 invoked by uid 99); 24 Jun 2010 01:44:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 01:44:12 +0000 X-ASF-Spam-Status: No, hits=-1541.3 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 01:44:11 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o5O1hpvl008900 for ; Thu, 24 Jun 2010 01:43:51 GMT Message-ID: <14449285.31131277343831277.JavaMail.jira@thor> Date: Wed, 23 Jun 2010 21:43:51 -0400 (EDT) From: "Konstantin Shvachko (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel In-Reply-To: <46346724.571991269926307413.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882000#action_12882000 ] Konstantin Shvachko commented on HDFS-1071: ------------------------------------------- So this has nothing to do with the safe mode then. As I understand the main thread holds the global (FSNamesystem) lock, and nothing else is going to be executed on the NN at that time. This seems to be the answer to the locking question. Could you please add JavaDoc to {{FSImageSaver}} class with the summary of the locking and the image identity issues. The approach with one serializing thread and others doing writes is in fact not harder. The queue growth issue is not a problem imo. The speed of the total write depends on the slowest writer, so everybody can simply wait until the slowest guy completes the assignment, they will have to wait for him anyways in the end. The advantage of this approach is that we guarantee everybody writes the exactly same bytes into the image files. With your approach, although the implementation is simpler, it is not obvious the contents of the image files will be the same, well at least was not for me. Out of pure curiosity if you know is there any benefit of multithreaded writing to directories on the same drive. > savenamespace should write the fsimage to all configured fs.name.dir in parallel > -------------------------------------------------------------------------------- > > Key: HDFS-1071 > URL: https://issues.apache.org/jira/browse/HDFS-1071 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Dmytro Molkov > Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.5.patch, HDFS-1071.patch > > > If you have a large number of files in HDFS, the fsimage file is very big. When the namenode restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This takes a long time, especially if there are many directories in fs.name.dir. Make the NN write the fsimage to all these directories in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.