hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mac Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2459) Cache HAR filesystem metadata
Date Fri, 13 May 2011 22:22:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033350#comment-13033350
] 

Mac Yang commented on MAPREDUCE-2459:
-------------------------------------

Mahadev, thanks for the feedback, I have updated the patch to include the following changes,
- Removed '_' from harMetaCache
- Added modification time stamp check and reparse the index files if necessary. This is to
address the case where the archive is overwritten in between two reads from the same process


> Cache HAR filesystem metadata
> -----------------------------
>
>                 Key: MAPREDUCE-2459
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2459
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Mac Yang
>            Assignee: Mac Yang
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2459.1.patch, MAPREDUCE-2459.2.patch
>
>
> Each HAR file system has two index files that contains information on how files are stored
in the part files. During the block location calculation, these indexes are reread for every
file in the archive. Caching the indexes and the status of the part files will greatly reduce
the number of name node operations during the job setup time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message