nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Persistent Crawldb Checksum error
Date Mon, 05 Dec 2011 13:38:58 GMT
Hi Danicela,

Have a look here [1]. Although your problem is not directly linked to
fetching, the symptoms and subsequent solution to the problem is the same.

Unfortunately this is quite a messy one but will hopefully get you going in
the right direction again.

[1]
http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F

On Mon, Dec 5, 2011 at 1:06 PM, Danicela nutch <Danicela-nutch@mail.com>wrote:

> Hi,
>
>  I was doing indexes in same time of updates, and after some successful
> indexes, I think the crawldb was corrupted and since then all generates,
> updates and indexes fail at the end of the process with the same error :
>
>
>  2011-12-03 05:47:44,017 WARN mapred.LocalJobRunner - job_local_0001
>  org.apache.hadoop.fs.ChecksumException: Checksum error:
> file:/home/nutch/nutchexec/runs/fr4/crawldb/current/part-00000/data at
> 3869358080
>  at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:278)
>  at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
>  at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
>  at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
>  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
>  at java.io.DataInputStream.readFully(DataInputStream.java:178)
>  at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>  at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
>  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062)
>  at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
>  at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
>  at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>  2011-12-03 05:47:44,509 FATAL crawl.CrawlDb - CrawlDb update:
> java.io.IOException: Job failed!
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>  at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
>  at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)
>
>
>
>  All tries fail at the '3869358080' point of 'data' file, that's why I
> think the crawldb has a problem.
>
>  What can I do to 'repair' the crawldb, if it's the problem of course ?
>
>  Thanks.
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message