Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14C45D4EC for ; Sat, 27 Oct 2012 09:11:14 +0000 (UTC) Received: (qmail 91977 invoked by uid 500); 27 Oct 2012 09:11:13 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 91851 invoked by uid 500); 27 Oct 2012 09:11:13 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 91807 invoked by uid 99); 27 Oct 2012 09:11:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 09:11:12 +0000 Date: Sat, 27 Oct 2012 09:11:12 +0000 (UTC) From: "Gopal V (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1918259518.34915.1351329072410.JavaMail.jiratomcat@arcas> In-Reply-To: <760191407.34881.1351327153553.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated MAPREDUCE-4755: ------------------------------- Description: The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers. With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data. The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O. was: The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers. With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data. The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic, when given enough address space & uses the buffer caches to deliver best effort I/O. > Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect > --------------------------------------------------------------------------- > > Key: MAPREDUCE-4755 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 3.0.0 > Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core) > Reporter: Gopal V > Assignee: Gopal V > Labels: optimization, sort > Attachments: 0001-first-cut-of-MMapOutputBuffer.patch > > > The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers. > With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data. > The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira