hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel JOKE" <joke...@gmail.com>
Subject OutOfMemory
Date Sat, 30 Jun 2007 17:32:10 GMT

I tried to update my db, using the following command:
 bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836

and my 2 nodes had an error and i can see the following exception:
2007-06-30 12:24:29,688 INFO  mapred.TaskInProgress - Error from
task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
        at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java

My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
What is the best practice ?

Do you any idea if they are a bug ? or is it just my conf which is not
correct ?

Thanks for your help

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message