hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3366) Shuffle/Merge improvements
Date Mon, 19 May 2008 11:07:55 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arun C Murthy updated HADOOP-3366:

    Attachment: ifile.patch

Here is an early version of my creatively titled SequenceFile replacement for intermediate
data in Map-Reduce (map-outputs)... IFile stands-out for "Intermediate File" *smile*.

Unfortunately the Writer isn't as tight as it can be, it needs to copy key/value into an internal
buffer (see HADOOP-3414 for necessary details). However, the Reader seems reasonably tight
and strictly does zero-copies. I chose to use DataInputBuffer as the key/value type in the
call for Reader.next since it plays nicely by offering an InputStream interface and also the
ability to provide it with a raw-buffer to work with; it can also be queried to get back the
raw-buffer without _any_ copies being made. I'll continue to plug-away, appreciate feedback.

> Shuffle/Merge improvements
> --------------------------
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>         Attachments: 3366.1.patch, 3366.1.patch, ifile.patch
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message