hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3366) Shuffle/Merge improvements
Date Fri, 16 May 2008 13:09:55 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HADOOP-3366:

    Attachment: 3366.1.patch

(An offline discussion led me to agree to the suggestion that we should not have the file
abstraction for the in memory merge. The file streams adds overhead which is not desirable
in a performance critical section.)
This half-done patch is up for a high-level review. It introduces a ByteArrayManager that
shuffle can use to store files as raw byte-arrays instead of files in the ramfs. It also defines
a merge routine that can merge a bunch of such byte-arrays. There is some dependency of the
remaining work, i.e., changing the shuffle code to use the ByteArrayManager instead of the
ramfs, on the patch for HADOOP-2095 (since that patch changes the layout of the intermediate
sequence file). I'll see what else can be done without that patch being available.

By the way, I have done the patch assuming the layout as <key-len><val-len><key><value>
  (the difference w.r.t the earlier proposed layout is that the lengths are together). That
made the parsing of the byte arrays simpler. 

> Shuffle/Merge improvements
> --------------------------
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>         Attachments: 3366.1.patch
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message