Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 8263 invoked from network); 22 Jan 2008 17:31:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2008 17:31:04 -0000 Received: (qmail 47699 invoked by uid 500); 22 Jan 2008 17:30:44 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 47673 invoked by uid 500); 22 Jan 2008 17:30:43 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 47655 invoked by uid 99); 22 Jan 2008 17:30:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2008 09:30:43 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of milesosb@gmail.com designates 64.233.182.188 as permitted sender) Received: from [64.233.182.188] (HELO nf-out-0910.google.com) (64.233.182.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2008 17:30:18 +0000 Received: by nf-out-0910.google.com with SMTP id c10so569794nfd.9 for ; Tue, 22 Jan 2008 09:30:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=bcH4IkJjxOBoxCIY0mbI7SKXjP9T35bc7DDYsrdSdrI=; b=gEb26ncgecVUC93OmSPWleVf45/QW3VhDJyA1iRELuATj6vyQtv4ukTD9UmKpvN1CO7+2uiq5dNVWRmnyWsTXwjDfCGx/uEqtL/IVAHi6h+HBPxgR9301ht7ArQQzTcsGUAsXukjRwm9oDK/KBL3Ce76FXhvbIfqbTsjmhRhpcQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=edbObmEpYsipzyxjUmbfq+Vxi7rGVWnJ+XDzC4rsks9HibiC4wLoTIvnzgzX84ci7sL6T+jLuDSEtl+NkE94GVcICjYh3w6qGhpMDDZPV8ocWbKES54IKnl+u9KpfI/o28YDK1SgSMX5WEBHlbwqwk3GXvpmXFEH9pqa8iSVUTY= Received: by 10.78.177.3 with SMTP id z3mr11606378hue.51.1201023022477; Tue, 22 Jan 2008 09:30:22 -0800 (PST) Received: by 10.78.124.12 with HTTP; Tue, 22 Jan 2008 09:30:22 -0800 (PST) Message-ID: <73e5a5310801220930q6d2bbe99l6a72451ca4bbea9c@mail.gmail.com> Date: Tue, 22 Jan 2008 17:30:22 +0000 From: "Miles Osborne" Sender: milesosb@gmail.com To: core-user@hadoop.apache.org Subject: Re: Hadoop-2438 In-Reply-To: <2843910A-10D1-40A1-ABC2-E32B458AAD15@yahoo-inc.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_22411_27406133.1201023022471" References: <73e5a5310801220626l40de4331ge59a4e8c1a2b793c@mail.gmail.com> <2843910A-10D1-40A1-ABC2-E32B458AAD15@yahoo-inc.com> X-Google-Sender-Auth: 0451bfcb65f0eabc X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_22411_27406133.1201023022471 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline The max heap size for each child was the default, 200M. Thanks for this tip: right now i'm playing around with machines which we mothballed years ago (they have 512M a pop!). once i put more memory into them i'll see if this works. (Hadoop is a great way to breath life into otherwise unloved boxes) Miles On 22/01/2008, Arun C Murthy wrote: > > > On Jan 22, 2008, at 6:26 AM, Miles Osborne wrote: > > > Has there been any progress / a work-around for this? > > > > Currently I'm experimenting with Streaming and I've encountered > > what looks > > like the same problem as described here: > > > > https://issues.apache.org/jira/browse/HADOOP-2438 > > > > Uh, I'm not sure how we missed H-2438, but what is ur max heap size > for the child? > http://lucene.apache.org/hadoop/docs/r0.15.2/hadoop- > default.html#mapred.child.java.opts > Check if it works with -Xmx512m. > > I think i need to open a bug to bump the default up to 512M, sigh! > > thanks, > Arun > > > So, I get much the same errors (see below). > > > > For this particular task, when I replace the mappers and reducers > > with the > > identity operation (ie just pass through the data) all is well. When > > instead I try to do something more taxing > > (in this case, gathering together all ngrams with the same prefix), > > I get > > these errors. > > > > My guess is that this is something to do with caching / buffering, > > since I > > presume that when the Stream mapper has real work to do, the > > associated Java > > streamer buffers input until the Mapper signals that it can process > > more > > data. If the Mapper is busy, then a lot of data would get cached, > > causing > > some internal buffer to overflow. > > > > Miles > > > >> > > > > Date: Tue Jan 22 14:12:28 GMT 2008 > > java.io.IOException: Broken pipe > > at java.io.FileOutputStream.writeBytes(Native Method) > > at java.io.FileOutputStream.write(FileOutputStream.java:260) > > at java.io.BufferedOutputStream.flushBuffer > > (BufferedOutputStream.java:65) > > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java > :123) > > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java > :124) > > at java.io.DataOutputStream.flush(DataOutputStream.java:106) > > at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) > > at org.apache.hadoop.mapred.TaskTracker$Child.main > > (TaskTracker.java:1760) > > > > > > at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) > > at org.apache.hadoop.mapred.TaskTracker$Child.main > > (TaskTracker.java:1760) > > > > java.io.IOException: MROutput/MRErrThread > > failed:java.lang.OutOfMemoryError: Java heap space > > at java.util.Arrays.copyOf(Arrays.java:2786) > > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java > :94) > > at java.io.DataOutputStream.write(DataOutputStream.java:90) > > at org.apache.hadoop.io.Text.write(Text.java:243) > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect > > (MapTask.java:349) > > at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run > > (PipeMapRed.java:344) > > > > at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) > > at org.apache.hadoop.mapred.TaskTracker$Child.main > > (TaskTracker.java:1760) > > > > java.io.IOException: MROutput/MRErrThread > > failed:java.lang.OutOfMemoryError: Java heap space > > at java.util.Arrays.copyOf(Arrays.java:2786) > > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java > :94) > > at java.io.DataOutputStream.write(DataOutputStream.java:90) > > at org.apache.hadoop.io.Text.write(Text.java:243) > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect > > (MapTask.java:349) > > at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run > > (PipeMapRed.java:344) > > > > at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) > > at org.apache.hadoop.mapred.TaskTracker$Child.main > > (TaskTracker.java:1760) > > ------=_Part_22411_27406133.1201023022471--