hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-684) Use HAR filesystem to merge parity files
Date Sat, 21 Nov 2009 15:23:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780996#action_12780996

dhruba borthakur commented on HDFS-684:

Is there a downside to crearting a HAR for every directory in the /raid directory? This will
imply that there will be one HAR for every data-set, because a single data-set usually reside
in one single directory.

If you create one RAID file for all files in a certain policy, and we RAID new files eveyr
hour based on that policy, then each of these iterations will have to detele and recreate
the entire HAR again, isn't it (I am assuming that you cannot add/delete items to a previously
created har file)

> Use HAR filesystem to merge parity files 
> -----------------------------------------
>                 Key: HDFS-684
>                 URL: https://issues.apache.org/jira/browse/HDFS-684
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
> The HDFS raid implementation (HDFS-503) creates a parity file for every file that is
RAIDed. This puts additional burden on the memory requirements of the namenode. It will be
 nice if the parity files are combined together using the HadoopArchive (har) format. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message