Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 75466 invoked from network); 14 Feb 2010 08:36:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Feb 2010 08:36:50 -0000 Received: (qmail 53362 invoked by uid 500); 14 Feb 2010 08:36:50 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 53299 invoked by uid 500); 14 Feb 2010 08:36:50 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 53287 invoked by uid 99); 14 Feb 2010 08:36:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Feb 2010 08:36:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Feb 2010 08:36:49 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4681B234C1F2 for ; Sun, 14 Feb 2010 00:36:29 -0800 (PST) Message-ID: <1643009428.261111266136589287.JavaMail.jira@brutus.apache.org> Date: Sun, 14 Feb 2010 08:36:29 +0000 (UTC) From: "dhruba borthakur (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1491) Use HAR filesystem to merge parity files In-Reply-To: <1417798041.259731266122548052.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833528#action_12833528 ] dhruba borthakur commented on MAPREDUCE-1491: --------------------------------------------- Code changes look good. I have one question. Suppose a directory /dhruba has 10 files in it. All files initially have a replication factor of 3. Then the RaidNode creates a xxx_raid.har that replaces all the parity files. Now suppose, a user deletes the first file /dhruba. Now /dhruba has only 9 files. This patch will now delete the har file associated with /dhruba. At this point, all the 9 files in /dhruba are left with a replication factor of 2 only! Am I understanding this right? Of course, the har file will get recreated pretty oon, but for some amount of time (however small) there could be only two replicas of a block. If my understanding is correct, can we create another JIRA that could address this situation? > Use HAR filesystem to merge parity files > ----------------------------------------- > > Key: MAPREDUCE-1491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1491 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Rodrigo Schmidt > Assignee: Rodrigo Schmidt > Attachments: MAPREDUCE-1491.0.patch > > > The HDFS raid implementation (HDFS-503) creates a parity file for every file that is RAIDed. This puts additional burden on the memory requirements of the namenode. It will be nice if the parity files are combined together using the HadoopArchive (har) format. > This was (HDFS-684) before, but raid migrated to MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.