hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5494) IFile.Reader should have a nextRawKey/nextRawValue
Date Fri, 27 Mar 2009 05:59:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689818#action_12689818

Chris Douglas commented on HADOOP-5494:

* In {{Merger.MergeQueue::next}}, there's a null check for the value. Is there any reason
why value can't be final and initialized in the cstr, since the assignment is always to itself?
* {{Merger.Segment::getValue}} can be removed, as can its {{value}} member field
* Initializing {{valBytes}} to 0 and growing it only enough to fit {{currentValueLength}}
seems too conservative
* Won't this have the same effect as the current code? Even though the read is deferred, each
{{IFile.Reader}} will still grow and keep large buffers. It won't grow to some multiple of
the large record, but it will still have a memory problem because each reader is maintaining
its own, growing buffer. By not growing exponentially, the current patch will have many more
allocations, too.

If the {{IFile.Reader}} from disk isn't doing any of its own buffering, then it shouldn't
need to keep and grow its own DIB for the value bytes. It should accept the DIB from the {{Merger.MergeQueue}}
and grow that, instead, since the byte[] ref wrapped by the DIB is invalid after being passed
to {{IFile.Reader::nextRawValue}}, anyway.

> IFile.Reader should have a nextRawKey/nextRawValue
> --------------------------------------------------
>                 Key: HADOOP-5494
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5494
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.21.0
>         Attachments: 5494-1.patch
> Merger.Segment has only the next() method defined which internally calls next(key,value)
on the underlying IFile stream. This would read both the key and the value bytes. It would
be good to have Merger.Segment.nextRawKey(), that would read only the key and delay reading
the value until needed (in Merger.MergeQueue.next()) via a new method Merger.Segment.nextRawValue().

> This would mean that we load only one value bytes at a time, and hence would incur potentially
much less (depending on how big the values are) on the memory footprint.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message