Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 88762 invoked from network); 14 Feb 2010 09:39:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Feb 2010 09:39:49 -0000 Received: (qmail 69271 invoked by uid 500); 14 Feb 2010 09:39:49 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 69197 invoked by uid 500); 14 Feb 2010 09:39:49 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 69187 invoked by uid 99); 14 Feb 2010 09:39:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Feb 2010 09:39:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Feb 2010 09:39:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E039C234C1EF for ; Sun, 14 Feb 2010 01:39:27 -0800 (PST) Message-ID: <6675304.261741266140367917.JavaMail.jira@brutus.apache.org> Date: Sun, 14 Feb 2010 09:39:27 +0000 (UTC) From: "Rodrigo Schmidt (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1491) Use HAR filesystem to merge parity files In-Reply-To: <1417798041.259731266122548052.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833542#action_12833542 ] Rodrigo Schmidt commented on MAPREDUCE-1491: -------------------------------------------- Dhruba, thanks for reviewing the code. As for your question, with the current code the .har files are never deleted automatically. In the scenario you presented, when you delete one of the files, the har file is left as it is, with all 10 parity files inside. I'm doing that exactly to avoid leaving the other files with less redundancy. Besides, if you recreate one of the files, a new parity file is generated outside the har, but the code on the RaidNode is smart enough to pick the parity file outside har. The downside of the current patch is that even if all files are deleted or recreated, the har file is never deleted and new parity files are created outside it. In the future I plan to fix that and enable the recreation of har files when they become obsolete. I didn't do that now to keep the code simple enough to be reviewed and deployed quickly. Besides, the main idea behind using har on raid is to do that for files that won't probably change in the future (otherwise recreating things becomes too expensive). The code uses a raid property called time_before_har (on each policy) to decide when the files are old enough to be hared. Setting this variable properly will avoid wasting space in most practical cases. Let me know what you think of this. > Use HAR filesystem to merge parity files > ----------------------------------------- > > Key: MAPREDUCE-1491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1491 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid > Reporter: Rodrigo Schmidt > Assignee: Rodrigo Schmidt > Attachments: MAPREDUCE-1491.0.patch > > > The HDFS raid implementation (HDFS-503) creates a parity file for every file that is RAIDed. This puts additional burden on the memory requirements of the namenode. It will be nice if the parity files are combined together using the HadoopArchive (har) format. > This was (HDFS-684) before, but raid migrated to MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.