hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
Date Sat, 27 Oct 2012 08:39:13 GMT
Gopal V created MAPREDUCE-4755:
----------------------------------

             Summary: Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
                 Key: MAPREDUCE-4755
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 3.0.0
         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
            Reporter: Gopal V
            Assignee: Gopal V
         Attachments: 0001-first-cut-of-MMapOutputBuffer.patch

The MapOutputBuffer has been written with a very severe constraint on the amount of memory
it can consume. This results in code that has to page-in & page-out (i.e spill) data as
it passes through the map buffers.

With the advent of the java.nio package, there is a fast and portable MMap alternative to
handling your own buffers. This exists outside the GC space of Java and yet provides decently
fast memory access to all the data.

The suggestion is that using mmap() direct buffers can be faster when a spill is involved
and simpler than the current spill logic, when given enough address space & uses the buffer
caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message