Return-Path: Delivered-To: apmail-lucene-nutch-user-archive@www.apache.org Received: (qmail 34851 invoked from network); 3 Sep 2009 11:03:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Sep 2009 11:03:34 -0000 Received: (qmail 71282 invoked by uid 500); 3 Sep 2009 11:03:33 -0000 Delivered-To: apmail-lucene-nutch-user-archive@lucene.apache.org Received: (qmail 71242 invoked by uid 500); 3 Sep 2009 11:03:33 -0000 Mailing-List: contact nutch-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@lucene.apache.org Delivered-To: mailing list nutch-user@lucene.apache.org Received: (qmail 71232 invoked by uid 99); 3 Sep 2009 11:03:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 11:03:33 +0000 X-ASF-Spam-Status: No, hits=-1.8 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [212.50.162.251] (HELO bradford.gov.uk) (212.50.162.251) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 11:03:22 +0000 Received: from svapmimesweeper.bradford.gov.uk ([6.99.16.148]) by alcatraz.bradford.gov.uk with ESMTP id <119176>; Thu, 3 Sep 2009 12:02:51 +0100 Received: from EXCHVS3.bradford.gov.uk (unverified) by svapmimesweeper.bradford.gov.uk (Clearswift SMTPRS 5.3.2) with ESMTP id for ; Thu, 3 Sep 2009 12:02:49 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CA2C86.13799DA3" Subject: Exception thrown during dedup Date: Thu, 3 Sep 2009 12:02:49 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Exception thrown during dedup Thread-Index: AcoshhMxf/ShNCHvS5OY/EyMsmaT7A== From: "Stephen Elves" To: X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01CA2C86.13799DA3 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable This is almost certainly an obvious problem but I'm new to nutch so: =20 Whilst trying to crawl a couple of our sites I get the following error which then halts the crawl: =20 Dedup: adding indexes in: crawl-20090902140756/indexes Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:44 8) at org.apache.nutch.crawl.Crawl.main(Crawl.java:149) =20 I've had a look in hadoop.log and found the following: =20 2009-09-02 14:29:33,437 WARN mapred.LocalJobRunner - job_local_0025 java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:388) at org.apache.hadoop.io.Text.set(Text.java:178) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.nex t(DeleteDuplicates.java:191) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.nex t(DeleteDuplicates.java:157) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask. java:192) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:1 76) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) I've had a serach round the internet and the archives for this list and haven't found anything that relates, any help would be appreciated! =20 Cheers Stephen Elves Corporate GIS Manager Strategy and Performance Unit Department of Performance and Commissioning City of Bradford Metropolitan District Council t: 01274 437269 f: 01274 432004 7th Floor, Jacob's Well, Manchester Road,=20 Bradford, West Yorks, UK.=20 BD1 5RW The information in this e-mail and any attachments is confidential. It is intended solely for the attention and use of the named addressee(s). If you are not the intended recipient please notify the sender immediately. Unless you are the intended recipient you are not authorised to, and must not, read, copy, distribute, use or retain this message or any part of it. This is a personal message and not representative of Bradford MDC or its policies. =20 ------_=_NextPart_001_01CA2C86.13799DA3--