hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3638) Cache the iFile index files in memory to reduce seeks during map output serving
Date Tue, 16 Sep 2008 09:55:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631315#action_12631315

Chris Douglas commented on HADOOP-3638:

* Has this been benchmarked, yet?
* MapTask
** The code in spillSingleRecord doesn't work unless the single record to be spilled also
belongs to partition 0.
** Enforcing the memory limit when calling getIndexInformation during the merge doesn't make
sense. Before mergeParts is called, the serialization buffer is released, so there should
be plenty of memory for caching indices. getIndexInformation can be pushed into mergeParts
and replaced with code that loads absent IndexRecord arrays into memory.
** It looks like map output indices are accumulated until the total exceeds the 1MB limit.
Why use a HashMap to store sequential Integer keys? (as a quick aside, Integer.valueOf is
unnecessary with autoboxing)
** Whether the memory limit will be exhausted can be calculated at the top of the loop, right?
Verifying it each time seems unnecessary, as does incrementing memory used for each record
(instead of the size of the index at the top), and the checks prior to writing.
** The semantics of writeSpillRecord are unnecessarily complex; it seems likely that this
method can be removed. Since it's known at the top of the spill loop whether each offset will
be cached in memory or written to disk, a branch that calls the existing writeIndexRecord
or that stores an entry in the cache would be much clearer.
* IndexRecord
(Sharad covered this)
* TaskTracker
** IndexRecord and IndexCache are in the mapred package and don't need to be imported.
* IndexCache
** (discussed offline)

> Cache the iFile index files in memory to reduce seeks during map output serving
> -------------------------------------------------------------------------------
>                 Key: HADOOP-3638
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3638
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>         Attachments: hadoop-3638-v1.patch, hadoop-3638-v2.patch, hadoop-3638-v3.patch
> The iFile index files can be cached in memory to reduce seeks during map output serving.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message