hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Vadali (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2110) add getArchiveIndex to HarFileSystem
Date Tue, 05 Oct 2010 19:30:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918129#action_12918129
] 

Ramkumar Vadali commented on MAPREDUCE-2110:
--------------------------------------------

@Mahadev, I agree that exposing an implementation detail is not good. But there is actually
more functionality that we would like to add to HarFileSystem, we could use this Jira to discuss
it.

Raid creates a parity file for each data file that is raided and has reduced replication.
As such this helps save disk space but doubles the number of inodes. Hence we create HARs
out of the parity files to reduce the number of new inodes. Now the HAR part files have reduced
replication as well and it is possible that a HAR part file has missing blocks, which we need
to fix.

To regenerate a HAR part file block, we need to identify what parity files/offsets map to
that part file block. This requires new code that parses the HAR index file and maps a partfile:offset
-> datafile:offset. This is the functionality that we would actually like to add. Thoughts?

> add getArchiveIndex to HarFileSystem
> ------------------------------------
>
>                 Key: MAPREDUCE-2110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2110
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Patrick Kling
>            Priority: Minor
>         Attachments: MAPREDUCE-2110.patch
>
>
> This patch adds a public getter for archiveIndex to HarFileSystem, allowing us to access
the index file corresponding to a har file system (useful for raid).
> Index: src/tools/org/apache/hadoop/fs/HarFileSystem.java
> ===================================================================
> --- src/tools/org/apache/hadoop/fs/HarFileSystem.java   (revision 1004421)
> +++ src/tools/org/apache/hadoop/fs/HarFileSystem.java   (working copy)
> @@ -759,6 +759,13 @@
>    }
>    
>    /**
> +   * returns the archive index
> +   */
> +  public Path getArchiveIndex() {
> +    return archiveIndex;
> +  }
> +
> +  /**
>     * return the top level archive path.
>     */
>    public Path getHomeDirectory() {

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message