hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-865) harchive: Reduce the number of open calls to _index and _masterindex
Date Tue, 18 Aug 2009 01:17:14 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Koji Noguchi updated MAPREDUCE-865:

    Attachment: mapreduce-865-0.patch

Primitive patch for discussion.

bq. So instead of open->read->close _index for each part file, thinking of keeping the
index file open when possible.

Instead of keeping an open handle, this one simply reads 'Stores' (range of caches) and keep
last 5 of them (configurable) in memory.
If the files are typical mapreduce outputs with many part-* files, number of open calls to
_index  will be significantly reduced.

> harchive: Reduce the number of open calls  to _index and _masterindex 
> ----------------------------------------------------------------------
>                 Key: MAPREDUCE-865
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-865
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Koji Noguchi
>            Priority: Minor
>         Attachments: mapreduce-865-0.patch
> When I have har file with 1000 files in it, 
>    % hadoop dfs -lsr har:///user/knoguchi/myhar.har/
> would open/read/close the _index/_masterindex files 1000 times.
> This makes the client slow and add some load to the namenode as well.
> Any ways to reduce this number?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message