nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: By Indexing I get: OutOfMemoryError: GC overhead limit exceeded ...
Date Sat, 06 Mar 2010 14:42:38 GMT
Can you use 'ps aux' to find out the -Xmx commandine parameter passed to
java for the following action ?

On Fri, Mar 5, 2010 at 1:14 PM, Patricio Galeas <pgaleas@yahoo.de> wrote:

> Hello all,
> I am  running Nutch in a Virtual Machine (Debian) with 8 GB RAM and 1,5TB
> for the hadoop temporal folder.
> Running the index process with a 1.3GB segments folder,  I got
>  "OutOfMemoryError: GC overhead limit exceeded"  (see below)
>
> I created the segments using slice=50000
> and I also set HADOOP_HEAPSIZE with the maximal physical memory (8000).
>
> Do I need more memory to run the index process?
> Are there some limitation to run Nutch in a Virtual Machine?
>
> Thank you!
> Pato
>
> ...
> ...
> 2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Nutch
> Scoring (org.apache.nutch.scoring.ScoringFilter)
> 2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Ontology
> Model Loader (org.apache.nutch.ontology.Ontology)
> 2010-03-05 19:52:13,867 INFO  lang.LanguageIdentifier - Language identifier
> configuration [1-4/2048]
> 2010-03-05 19:52:22,961 INFO  lang.LanguageIdentifier - Language identifier
> plugin supports: it(1000) is(1000) hu(1000) th(1000) sv(1000) sq(1000)
> fr(1000) ru(1000) fi(1000) es(1000) en(1000) el(1000) ee(1000) pt(1000)
> de(1000) da(1000) pl(1000) no(1000) nl(1000)
> 2010-03-05 19:52:22,961 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.analysis.lang.LanguageIndexingFilter
> 2010-03-05 19:52:22,963 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2010-03-05 19:52:22,964 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2010-03-05 19:52:36,278 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:775)
>        at org.apache.hadoop.io.Text.encode(Text.java:388)
>        at org.apache.hadoop.io.Text.encode(Text.java:369)
>        at org.apache.hadoop.io.Text.writeString(Text.java:409)
>        at org.apache.nutch.parse.Outlink.write(Outlink.java:52)
>        at org.apache.nutch.parse.ParseData.write(ParseData.java:152)
>        at
> org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:67)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:50)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 2010-03-05 19:52:37,277 FATAL indexer.Indexer - Indexer:
> java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
>        at org.apache.nutch.indexer.Indexer.run(Indexer.java:92)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.indexer.Indexer.main(Indexer.java:101)
>
> __________________________________________________
> Do You Yahoo!?
> Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz
> gegen Massenmails.
> http://mail.yahoo.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message