nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: NPE in CrawlDbReducer
Date Wed, 12 Apr 2006 15:59:29 GMT
Marko Bauhardt wrote:
> Hi all,
> I use nutch-0.8-dev with metadatas. If i update the crawldb a NPE on 
> line 71 in CrawldbReducer occurs.
>
> 060412 163435 job_yh0f7t
> java.lang.NullPointerException
>     at 
> org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:71)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:283)
>     at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:144)
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>     at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
>     at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:102)
>
>
> The reason is that the "highest" CrawlDatum has no metadatas (null), 
> but this metadata are set to the "result" CrawlDatum.
> Line 67:
>     result.set(highest);
>
>
> After that the metadata's from the "result" CrawlDatum are used.
> Line 71:
>         result.getMetaData().putAll(old.getMetaData());
>
>
> Is this a bug?
>

Yes. I think this should be fixed in CrawlDbReducer - although having 
this CrawlDatum field null seems awkward, I thought I'd rather 
initialize it when creating CrawlDatum - but this may lead to 
unnecessary creation of many MapWritable object... so, we'll fix it here.

Please try this patch:

Index: src/java/org/apache/nutch/crawl/CrawlDbReducer.java
===================================================================
--- src/java/org/apache/nutch/crawl/CrawlDbReducer.java (revision 393266)
+++ src/java/org/apache/nutch/crawl/CrawlDbReducer.java (working copy)
@@ -68,6 +68,7 @@
     if (old != null) {
       // copy metadata from old, if exists
       if (old.getMetaData() != null) {
+        if (result.getMetaData() == null) result.setMetaData(new 
MapWritable());
         result.getMetaData().putAll(old.getMetaData());
         // overlay with new, if any
         if (highest.getMetaData() != null)


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message