hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5494) IFile.Reader should have a nextRawKey/nextRawValue
Date Mon, 23 Mar 2009 05:33:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HADOOP-5494:

    Attachment: 5494-1.patch

This patch (an early one that still needs large scale testing) does the following:
1) Removes the method next(DataInputBuffer, DataInputBuffer) from the Merger.Segment class
and the IFile.Reader classes
2) nextRawKey and nextRawValue are defined in those classes
3) nextRawValue is called in Merger.MergeQueue.next() and the DataInputBuffer passed is allocated
memory then (true for IFile.Reader class's implementation; the other case IFile.InMemoryReader
is the case where the value is in memory already)
4) Removes the IFile buffering that it does in addition to FileSystem's buffering. The FileSystem
level buffering should be sufficient.

The other thing that can be done is to have one _next_ method that takes a _key DataInputBuffer_
and returns two things - the filled up _key DataInputBuffer_ and a _stream_ for the value
(a DataInput) without actually allocating memory upfront for the value (again, true for only
the IFile.Reader case). But that can probably be a follow up jira.

> IFile.Reader should have a nextRawKey/nextRawValue
> --------------------------------------------------
>                 Key: HADOOP-5494
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5494
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.21.0
>         Attachments: 5494-1.patch
> Merger.Segment has only the next() method defined which internally calls next(key,value)
on the underlying IFile stream. This would read both the key and the value bytes. It would
be good to have Merger.Segment.nextRawKey(), that would read only the key and delay reading
the value until needed (in Merger.MergeQueue.next()) via a new method Merger.Segment.nextRawValue().

> This would mean that we load only one value bytes at a time, and hence would incur potentially
much less (depending on how big the values are) on the memory footprint.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message