hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geet Garg <garggee...@gmail.com>
Subject Error while running Terrier-3.0 on hadoop or mahout-0.3 on hadoop
Date Wed, 27 Oct 2010 17:44:00 GMT
Hi,

I'm trying to run Terrier-3.0 on hadoop-0.18.3, with general configuration
settings. My hadoop cluster is running on 3 nodes, (1 master, 3 slaves). If
I try to run Terrier Basic Single Pass Indexing (with default
configurations) on a very small data ~1 GB, it works fine. But for larger
data ~10 GB, I get the error:

attempt_201010272120_0001_m_000002_0: java.lang.OutOfMemoryError: GC
overhead limit exceeded
attempt_201010272120_0001_m_000002_0:   at
org.terrier.structures.indexing.singlepass.hadoop.SplitEmittedTerm.createNewTerm(SplitEmittedTerm.java:64)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter.writeTerm(HadoopRunWriter.java:84)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.structures.indexing.singlepass.MemoryPostings.writeToWriter(MemoryPostings.java:151)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.structures.indexing.singlepass.MemoryPostings.finish(MemoryPostings.java:112)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.forceFlush(Hadoop_BasicSinglePassIndexer.java:308)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.closeMap(Hadoop_BasicSinglePassIndexer.java:419)
attempt_201010272120_0001_m_000002_0:   at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.close(Hadoop_BasicSinglePassIndexer.java:236)
attempt_201010272120_0001_m_000002_0:   at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
attempt_201010272120_0001_m_000002_0:   at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
attempt_201010272120_0001_m_000002_0:   at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)


Also, I tried running Mahout-0.3 on hadoop-0.20.2. It works fine for tasks
on small datasets ( < 1 MB). But for even slightly larger datasets (~30 MB)
it starts giving error:

Error: java.lang.OutOfMemoryError: Java heap
space
        at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:446)

        at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.java:409)

        at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.java:202)

        at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(TransactionTree.java:285)

        at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:51)
        at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:33)
        at
org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

        at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)

        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)

        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)

        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)


I'm absolutely stuck. I've tried increasing the java heap size in
hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.

Can anyone help me please?

Thanks.

Regards,
Geet

-- 
Geet Garg
Final Year Dual Degree Student
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA
Phone: +91 97344 26187
e-Mail: garggeetus@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message