nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Parse and DBUpdate Exception
Date Wed, 21 Aug 2013 14:59:00 GMT
Hi Ward,
The main problem with using this set up seems to have been the
gora-sql-mapping.xml config file. The one which ships with Nutch was only a
guide and has been proven time after time to be unsuitable for many set ups.
This being said, it should be noted that the entire gora-sql module is now
deprecated as it was known to be buggy. I would therefore suggest that you
debug the parsing and updatedb phases of the crawl which are causing the
headache.
Hth

On Wednesday, August 21, 2013, Ward Loving <ward@appirio.com> wrote:
> Hello all:
>
> I'm running Nutch 2.1 against a MySQL backend (5.6.11) on my MAC.  I
seeded
> Nutch with around 100 sites and I'm able to fetch ~12,000 pages and parse
> about ~9,000 pages before getting the following error.  After I get the
> error, Nutch seems to repeat the last fetch of the same group of sites.
>  I'm not the first to get this error but I haven't seen any insight into
> what might be causing it.
>
>
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=parse, jobid=job_local_0001
> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251)
> at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:259)
> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:302)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.parse.ParserJob.main(ParserJob.java:306)
> DbUpdaterJob: starting
> 2013-08-19 01:09:23.039 java[12666:1703] Unable to load realm info from
> SCDynamicStore
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=update-table, jobid=job_local_0001
> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
> at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98)
> at org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105)
> at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)
>
> My first thought was that I've used up the memory in my local MySQL table
> space or something.  But I'm able to do some manual inserts.
>
> Any ideas?
>
>
> --
> Ward Loving
> Senior Technical Consultant
> Appirio, Inc.
> www.appirio.com
> (706) 225-9475
>

-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message