Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 91667 invoked from network); 18 Sep 2007 03:11:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Sep 2007 03:11:21 -0000 Received: (qmail 14049 invoked by uid 500); 18 Sep 2007 03:11:09 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 14002 invoked by uid 500); 18 Sep 2007 03:11:09 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 13992 invoked by uid 99); 18 Sep 2007 03:11:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Sep 2007 20:11:09 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2007 03:11:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1FB93714168 for ; Mon, 17 Sep 2007 20:10:45 -0700 (PDT) Message-ID: <14812510.1190085045061.JavaMail.jira@brutus> Date: Mon, 17 Sep 2007 20:10:45 -0700 (PDT) From: "Sameer Al-Sakran (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1837) Insufficient space exception from InMemoryFileSystem after raising fs.inmemory.size.mb In-Reply-To: <33405454.1188949005364.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528234 ] Sameer Al-Sakran commented on HADOOP-1837: ------------------------------------------ I am getting this bug relatively frequently, with 1 hour+ time outs. I've rachetted up the in-mem file system to as high as 1G with no effect. > Insufficient space exception from InMemoryFileSystem after raising fs.inmemory.size.mb > -------------------------------------------------------------------------------------- > > Key: HADOOP-1837 > URL: https://issues.apache.org/jira/browse/HADOOP-1837 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.13.1 > Reporter: Joydeep Sen Sarma > Priority: Minor > > trying out larger in-memory file system (curious if that helped speed the sort phase). in this run - i had sized it to 500MB. There's plenty of RAM in the machine (8GB) and the tasks are launched with -Xmx2048 option (so there's plenty of heap space as well). However - observing this exception: > 2007-09-04 13:47:51,718 INFO org.apache.hadoop.mapred.ReduceTask: task_0002_r_000002_0 Copying task_0002_m_000124_0 output from hadoop004.sf > 2p.facebook.com. > 2007-09-04 13:47:52,188 WARN org.apache.hadoop.mapred.ReduceTask: task_0002_r_000002_0 copy failed: task_0002_m_000124_0 from hadoop004.sf2p > .facebook.com > 2007-09-04 13:47:52,189 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Insufficient space > at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.write(InMemoryFileSystem.java:181) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at java.io.DataOutputStream.flush(DataOutputStream.java:106) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91) > at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.close(ChecksumFileSystem.java:416) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:48) > at org.apache.hadoop.fs.FSDataOutputStream$Buffer.close(FSDataOutputStream.java:72) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:92) > at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:251) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:680) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:641) > 2007-09-04 13:47:52,189 WARN org.apache.hadoop.mapred.ReduceTask: task_0002_r_000002_0 adding host hadoop004.sf2p.facebook.com to penalty bo > x, next contact in 64 seconds > so this ends up slowing stuff down since we backoff on the source host (even though it's not it's fault). Looking at the code, seems like ReduceTask is trying to write more to InMemoryFileSystem than it should. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.