nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricio Galeas <pgal...@yahoo.de>
Subject AW: By Indexing I get: OutOfMemoryError: GC overhead limit exceeded ...
Date Mon, 08 Mar 2010 19:03:49 GMT
Hello Ted,

I ran the command 'ps -aux' and I confirmed that only 1GB was defined.
I adjust  NUTCH_HEAPSIZE to 8GB (physical RAM) and ran it again
successfully.

Do you know which parameters need to be adjusted if not enough physical RAM is available on
the server? For example for 2GB RAM.
I ran a web crawl (depth=6) without the parameter topN and the segments grew exponetially.
Later, I had a lot of problems by merging the segments and by indexing
(not enough memory, too many opened files, etc.).

Thank you for your help
Pato



----- Ursprüngliche Mail ----
Von: Ted Yu <yuzhihong@gmail.com>
An: nutch-user@lucene.apache.org
Gesendet: Samstag, den 6. März 2010, 15:42:38 Uhr
Betreff: Re: By Indexing I get: OutOfMemoryError: GC overhead limit exceeded  ...

Can you use 'ps aux' to find out the -Xmx commandine parameter passed to
java for the following action ?

On Fri, Mar 5, 2010 at 1:14 PM, Patricio Galeas <pgaleas@yahoo.de> wrote:

> Hello all,
> I am  running Nutch in a Virtual Machine (Debian) with 8 GB RAM and 1,5TB
> for the hadoop temporal folder.
> Running the index process with a 1.3GB segments folder,  I got
>  "OutOfMemoryError: GC overhead limit exceeded"  (see below)
>
> I created the segments using slice=50000
> and I also set HADOOP_HEAPSIZE with the maximal physical memory (8000).
>
> Do I need more memory to run the index process?
> Are there some limitation to run Nutch in a Virtual Machine?
>
> Thank you!
> Pato
>
> ...
> ...
> 2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Nutch
> Scoring (org.apache.nutch.scoring.ScoringFilter)
> 2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Ontology
> Model Loader (org.apache.nutch.ontology.Ontology)
> 2010-03-05 19:52:13,867 INFO  lang.LanguageIdentifier - Language identifier
> configuration [1-4/2048]
> 2010-03-05 19:52:22,961 INFO  lang.LanguageIdentifier - Language identifier
> plugin supports: it(1000) is(1000) hu(1000) th(1000) sv(1000) sq(1000)
> fr(1000) ru(1000) fi(1000) es(1000) en(1000) el(1000) ee(1000) pt(1000)
> de(1000) da(1000) pl(1000) no(1000) nl(1000)
> 2010-03-05 19:52:22,961 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.analysis.lang.LanguageIndexingFilter
> 2010-03-05 19:52:22,963 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2010-03-05 19:52:22,964 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2010-03-05 19:52:36,278 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:775)
>        at org.apache.hadoop.io.Text.encode(Text.java:388)
>        at org.apache.hadoop.io.Text.encode(Text.java:369)
>        at org.apache.hadoop.io.Text.writeString(Text.java:409)
>        at org.apache.nutch.parse.Outlink.write(Outlink.java:52)
>        at org.apache.nutch.parse.ParseData.write(ParseData.java:152)
>        at
> org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:67)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:50)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 2010-03-05 19:52:37,277 FATAL indexer.Indexer - Indexer:
> java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
>        at org.apache.nutch.indexer.Indexer.run(Indexer.java:92)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.indexer.Indexer.main(Indexer.java:101)
>
> __________________________________________________
> Do You Yahoo!?
> Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz
> gegen Massenmails.
> http://mail.yahoo.com
>


__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails.

http://mail.yahoo.com 

Mime
View raw message