Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B212D808 for ; Sat, 27 Oct 2012 14:37:12 +0000 (UTC) Received: (qmail 88610 invoked by uid 500); 27 Oct 2012 14:37:12 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 88580 invoked by uid 500); 27 Oct 2012 14:37:12 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 88571 invoked by uid 99); 27 Oct 2012 14:37:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 14:37:12 +0000 Date: Sat, 27 Oct 2012 14:37:11 +0000 (UTC) From: "Gopal V (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <310359638.35276.1351348632309.JavaMail.jiratomcat@arcas> In-Reply-To: <760191407.34881.1351327153553.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated MAPREDUCE-4755: ------------------------------- Attachment: (was: 0001-first-cut-of-MMapOutputBuffer.patch) > Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect > --------------------------------------------------------------------------- > > Key: MAPREDUCE-4755 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 3.0.0 > Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core) > Reporter: Gopal V > Assignee: Gopal V > Labels: optimization, sort > > The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers. > With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data. > The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira