Return-Path: X-Original-To: apmail-nutch-user-archive@www.apache.org Delivered-To: apmail-nutch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB4C079E5 for ; Mon, 5 Dec 2011 13:39:26 +0000 (UTC) Received: (qmail 32509 invoked by uid 500); 5 Dec 2011 13:39:25 -0000 Delivered-To: apmail-nutch-user-archive@nutch.apache.org Received: (qmail 32473 invoked by uid 500); 5 Dec 2011 13:39:25 -0000 Mailing-List: contact user-help@nutch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@nutch.apache.org Delivered-To: mailing list user@nutch.apache.org Received: (qmail 32465 invoked by uid 99); 5 Dec 2011 13:39:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 13:39:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lewis.mcgibbney@gmail.com designates 209.85.161.54 as permitted sender) Received: from [209.85.161.54] (HELO mail-fx0-f54.google.com) (209.85.161.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 13:39:19 +0000 Received: by faak28 with SMTP id k28so5494222faa.27 for ; Mon, 05 Dec 2011 05:38:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=L0F8Qk/9qS32vX3uvcMQt1rTYZJQHvQTQBdac9IHvxw=; b=asMl5qET4xPNK012F581Dh7RbMOK/odPaYYHE2M8Hp4h7SbYY1pG80H79j9TQf9+Xk rTIN/jDLPerqG6wNtc9mt3cw350OmDGdqA1gvOQgttVIq/sbikj34urULK5xbsTOyxfq UF/QCApsxhUpCZl4szhFjLobMMeQGrnn4wJwI= MIME-Version: 1.0 Received: by 10.180.105.3 with SMTP id gi3mr12811607wib.36.1323092338122; Mon, 05 Dec 2011 05:38:58 -0800 (PST) Received: by 10.227.13.148 with HTTP; Mon, 5 Dec 2011 05:38:58 -0800 (PST) In-Reply-To: <20111205130642.30200@gmx.com> References: <20111205130642.30200@gmx.com> Date: Mon, 5 Dec 2011 13:38:58 +0000 Message-ID: Subject: Re: Persistent Crawldb Checksum error From: Lewis John Mcgibbney To: user@nutch.apache.org Content-Type: multipart/alternative; boundary=f46d04428260d19f9e04b35870a4 --f46d04428260d19f9e04b35870a4 Content-Type: text/plain; charset=ISO-8859-1 Hi Danicela, Have a look here [1]. Although your problem is not directly linked to fetching, the symptoms and subsequent solution to the problem is the same. Unfortunately this is quite a messy one but will hopefully get you going in the right direction again. [1] http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F On Mon, Dec 5, 2011 at 1:06 PM, Danicela nutch wrote: > Hi, > > I was doing indexes in same time of updates, and after some successful > indexes, I think the crawldb was corrupted and since then all generates, > updates and indexes fail at the end of the process with the same error : > > > 2011-12-03 05:47:44,017 WARN mapred.LocalJobRunner - job_local_0001 > org.apache.hadoop.fs.ChecksumException: Checksum error: > file:/home/nutch/nutchexec/runs/fr4/crawldb/current/part-00000/data at > 3869358080 > at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:278) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > 2011-12-03 05:47:44,509 FATAL crawl.CrawlDb - CrawlDb update: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94) > at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150) > > > > All tries fail at the '3869358080' point of 'data' file, that's why I > think the crawldb has a problem. > > What can I do to 'repair' the crawldb, if it's the problem of course ? > > Thanks. > -- *Lewis* --f46d04428260d19f9e04b35870a4--