nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Elves" <stephen.el...@bradford.gov.uk>
Subject Exception thrown during dedup
Date Thu, 03 Sep 2009 11:02:49 GMT
This is almost certainly an obvious problem but I'm new to nutch so:
 
Whilst trying to crawl a couple of our sites I get the following error
which then halts the crawl:
 
        Dedup: adding indexes in: crawl-20090902140756/indexes
        Exception in thread "main" java.io.IOException: Job failed!
                at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
                at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:44
8)
                at org.apache.nutch.crawl.Crawl.main(Crawl.java:149)
 
I've had a look in hadoop.log and found the following:
 
        2009-09-02 14:29:33,437 WARN  mapred.LocalJobRunner -
job_local_0025
        java.lang.NullPointerException
         at org.apache.hadoop.io.Text.encode(Text.java:388)
         at org.apache.hadoop.io.Text.set(Text.java:178)
         at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.nex
t(DeleteDuplicates.java:191)
         at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.nex
t(DeleteDuplicates.java:157)
         at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.
java:192)
         at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:1
76)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)

I've had a serach round the internet and the archives for this list and
haven't found anything that relates, any help would be appreciated!
 
Cheers
Stephen Elves
Corporate GIS Manager
Strategy and Performance Unit
Department of Performance and Commissioning
City of Bradford Metropolitan District Council
t: 01274 437269
f: 01274 432004
7th Floor, Jacob's Well, Manchester Road, 
Bradford, West Yorks, UK. 
BD1 5RW
The information in this e-mail and any attachments is confidential. It
is intended solely for the attention and use of the named addressee(s).
If you are not the intended recipient please notify the sender
immediately. Unless you are the intended recipient you are not
authorised to, and must not, read, copy, distribute, use or retain this
message or any part of it.

This is a personal message and not representative of Bradford MDC or its
policies.

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message