Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 29293 invoked from network); 29 Oct 2010 06:10:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Oct 2010 06:10:29 -0000 Received: (qmail 72672 invoked by uid 500); 29 Oct 2010 06:10:26 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 72250 invoked by uid 500); 29 Oct 2010 06:10:26 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Delivered-To: moderator for common-user@hadoop.apache.org Received: (qmail 24403 invoked by uid 99); 27 Oct 2010 17:44:26 -0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of garggeetus@gmail.com designates 209.85.216.169 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=412k2vc17YxqjhVKP4j/qaMe51CvsDOQ72VwpX7x6EA=; b=drzpVAk1B4TmjBjQUG6Qp3Sp+V/aXWE8+4B9E3dONjO084kUEg6VqKDuuMKHE8NwHg PMudEM82FWIOARYsK3X+Vjo7dev8sry31Yg/LAtpyKKISFnxtx5I8/Rjp4xEJqK2ylUo dgRUIbQeSvkWBHSflUg9EVu5Bdnqs+bJIopuQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=WEr3yZiqZQcFpFTGQ4cF1Yftqk4UXQtt9opazWchs2ADr4SaRXUfklvNJ+GYD1i5f3 kNwgnbQ471DrPcHZG/0dnpZHHLLTZ3XZ5lZv7WPnLZSdr2/uas0o4D4Xdo9lNQzJqcdC VZcNj5aNEBhb2O+MmRby2gUM3/x1ghN8MW+8M= MIME-Version: 1.0 Date: Wed, 27 Oct 2010 23:14:00 +0530 Message-ID: Subject: Error while running Terrier-3.0 on hadoop or mahout-0.3 on hadoop From: Geet Garg To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016361e815c3eb92d04939cc57a --0016361e815c3eb92d04939cc57a Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm trying to run Terrier-3.0 on hadoop-0.18.3, with general configuration settings. My hadoop cluster is running on 3 nodes, (1 master, 3 slaves). If I try to run Terrier Basic Single Pass Indexing (with default configurations) on a very small data ~1 GB, it works fine. But for larger data ~10 GB, I get the error: attempt_201010272120_0001_m_000002_0: java.lang.OutOfMemoryError: GC overhead limit exceeded attempt_201010272120_0001_m_000002_0: at org.terrier.structures.indexing.singlepass.hadoop.SplitEmittedTerm.createNewTerm(SplitEmittedTerm.java:64) attempt_201010272120_0001_m_000002_0: at org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter.writeTerm(HadoopRunWriter.java:84) attempt_201010272120_0001_m_000002_0: at org.terrier.structures.indexing.singlepass.MemoryPostings.writeToWriter(MemoryPostings.java:151) attempt_201010272120_0001_m_000002_0: at org.terrier.structures.indexing.singlepass.MemoryPostings.finish(MemoryPostings.java:112) attempt_201010272120_0001_m_000002_0: at org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.forceFlush(Hadoop_BasicSinglePassIndexer.java:308) attempt_201010272120_0001_m_000002_0: at org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.closeMap(Hadoop_BasicSinglePassIndexer.java:419) attempt_201010272120_0001_m_000002_0: at org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.close(Hadoop_BasicSinglePassIndexer.java:236) attempt_201010272120_0001_m_000002_0: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) attempt_201010272120_0001_m_000002_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) attempt_201010272120_0001_m_000002_0: at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Also, I tried running Mahout-0.3 on hadoop-0.20.2. It works fine for tasks on small datasets ( < 1 MB). But for even slightly larger datasets (~30 MB) it starts giving error: Error: java.lang.OutOfMemoryError: Java heap space at org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:446) at org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.java:409) at org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.java:202) at org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(TransactionTree.java:285) at org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:51) at org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:33) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173) I'm absolutely stuck. I've tried increasing the java heap size in hadoop-env.sh. I've tried using parallelGC. Nothing seems to work. Can anyone help me please? Thanks. Regards, Geet -- Geet Garg Final Year Dual Degree Student Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA Phone: +91 97344 26187 e-Mail: garggeetus@gmail.com --0016361e815c3eb92d04939cc57a--