Mailing-List: contact user-help@nutch.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@nutch.apache.org
Received-SPF: pass (athena.apache.org: domain of lewis.mcgibbney@gmail.com
 designates 209.85.161.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <20111205130642.30200@gmx.com>
References: <20111205130642.30200@gmx.com>
Date: Mon, 5 Dec 2011 13:38:58 +0000
Message-ID: 
 <CAGaRif1GsWrPP9+3ybfGBiwjAcmELVWwYV04KHUOZpeFgSz7xw@mail.gmail.com>
Subject: Re: Persistent Crawldb Checksum error
From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
To: user@nutch.apache.org
Content-Type: multipart/alternative; boundary=f46d04428260d19f9e04b35870a4

--f46d04428260d19f9e04b35870a4
Content-Type: text/plain; charset=ISO-8859-1

Hi Danicela,

Have a look here [1]. Although your problem is not directly linked to
fetching, the symptoms and subsequent solution to the problem is the same.

Unfortunately this is quite a messy one but will hopefully get you going in
the right direction again.

[1]
http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F

On Mon, Dec 5, 2011 at 1:06 PM, Danicela nutch <Danicela-nutch@mail.com>wrote:

> Hi,
>
>  I was doing indexes in same time of updates, and after some successful
> indexes, I think the crawldb was corrupted and since then all generates,
> updates and indexes fail at the end of the process with the same error :
>
>
>  2011-12-03 05:47:44,017 WARN mapred.LocalJobRunner - job_local_0001
>  org.apache.hadoop.fs.ChecksumException: Checksum error:
> file:/home/nutch/nutchexec/runs/fr4/crawldb/current/part-00000/data at
> 3869358080
>  at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:278)
>  at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
>  at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
>  at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
>  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
>  at java.io.DataInputStream.readFully(DataInputStream.java:178)
>  at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>  at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
>  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062)
>  at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
>  at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
>  at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>  2011-12-03 05:47:44,509 FATAL crawl.CrawlDb - CrawlDb update:
> java.io.IOException: Job failed!
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>  at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
>  at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)
>
>
>
>  All tries fail at the '3869358080' point of 'data' file, that's why I
> think the crawldb has a problem.
>
>  What can I do to 'repair' the crawldb, if it's the problem of course ?
>
>  Thanks.
>


-- 
*Lewis*

--f46d04428260d19f9e04b35870a4--