Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 6112 invoked from network); 25 Feb 2009 05:41:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Feb 2009 05:41:05 -0000 Received: (qmail 71248 invoked by uid 500); 25 Feb 2009 05:40:59 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 71204 invoked by uid 500); 25 Feb 2009 05:40:59 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 71193 invoked by uid 99); 25 Feb 2009 05:40:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 21:40:59 -0800 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.68.5.15] (HELO relay01.pair.com) (209.68.5.15) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 25 Feb 2009 05:40:50 +0000 Received: (qmail 22005 invoked from network); 25 Feb 2009 05:40:27 -0000 Received: from 70.137.148.23 (HELO ?10.0.13.77?) (70.137.148.23) by relay01.pair.com with SMTP; 25 Feb 2009 05:40:27 -0000 X-pair-Authenticated: 70.137.148.23 Message-ID: <49A4D9CD.4050206@archive.org> Date: Tue, 24 Feb 2009 21:40:29 -0800 From: Gordon Mohr User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: Re: OutOfMemory error processing large amounts of gz files References: <22193552.post@talk.nabble.com> In-Reply-To: <22193552.post@talk.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org If you're doing a lot of gzip compression/decompression, you *might* be hitting this 6+-year-old Sun JVM bug: "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not called promptly enough" http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189 A workaround is listed in the issue: ensuring you call close() or end() on the Deflater; something similar might apply to Inflater. (This is one of those fun JVM situations where having more heap space may make OOMEs more likely: less heap memory pressure leaves more un-GCd or un-finalized heap objects around, each of which is holding a bit of native memory.) - Gordon @ IA bzheng wrote: > I have about 24k gz files (about 550GB total) on hdfs and has a really simple > java program to convert them into sequence files. If the script's > setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory > error at about 35% map complete. If I make the script process 2k files per > job and run 12 jobs consecutively, then it goes through all files fine. The > cluster I'm using has about 67 nodes. Each nodes has 16GB memory, max 7 > map, and max 2 reduce. > > The map task is really simple, it takes LongWritable as key and Text as > value, generate a Text newKey, and output.collect(Text newKey, Text value). > It doesn't have any code that can possibly leak memory. > > There's no stack trace for the vast majority of the OutOfMemory error, > there's just a single line in the log like this: > > 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: > java.lang.OutOfMemoryError: Java heap space > > I can't find the stack trace right now, but rarely the OutOfMemory error > originates from some hadoop config array copy opertaion. There's no special > config for the script.