Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 30631 invoked from network); 8 Feb 2011 14:59:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Feb 2011 14:59:26 -0000 Received: (qmail 80532 invoked by uid 500); 8 Feb 2011 14:59:26 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 79756 invoked by uid 500); 8 Feb 2011 14:59:22 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 79716 invoked by uid 99); 8 Feb 2011 14:59:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 14:59:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 14:59:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 94E3E1999E7 for ; Tue, 8 Feb 2011 14:58:57 +0000 (UTC) Date: Tue, 8 Feb 2011 14:58:57 +0000 (UTC) From: "Jay Hacker (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <2003604646.2401.1297177137606.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Created: (MAPREDUCE-2308) Sort buffer size (io.sort.mb) is limited to < 2 GB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Sort buffer size (io.sort.mb) is limited to < 2 GB -------------------------------------------------- Key: MAPREDUCE-2308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2308 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0, 0.20.2, 0.20.1 Environment: Cloudera CDH3b3 (0.20.2+) Reporter: Jay Hacker Priority: Minor I have MapReduce jobs that use a large amount of per-task memory, because the algorithm I'm using converges faster if more data is together on a node. I have my JVM heap size set at 3200 MB, and if I use the popular rule of thumb that io.sort.mb should be ~70% of that, I get 2240 MB. I rounded this down to 2048 MB, but map tasks crash with : {noformat} java.io.IOException: Invalid "io.sort.mb": 2048 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:790) ... {noformat} MapTask.MapOutputBuffer implements its buffer with a byte[] of size io.sort.mb (in bytes), and is sanity checking the size before allocating the array. The problem is that Java arrays can't have more than 2^31 - 1 elements (even with a 64-bit JVM), and this is a limitation of the Java language specificiation itself. As memory and data sizes grow, this would seem to be a crippling limtiation of Java. It would be nice if this ceiling were documented, and an error issued sooner, e.g. in jobtracker startup upon reading the config. Going forward, we may need to implement some array of arrays hack for large buffers. :( -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira