cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <>
Subject Re: Lucene indexing / crawling problem
Date Mon, 09 Jun 2003 11:54:15 GMT
Conal Tuohy wrote:

>I'm creating a Lucene index using an XSP based on the sample, but I have a strange problem.
>Some of the pages are crawled, but some are not crawled, and I can't see why. 
>I have DEBUG logging for the components, so I can see the crawler crawling
the site. I can see it read the links for each page, and I can see that it doesn't exclude
any of the links. Yet it doesn't actually follow those links - the crawl simply comes to an
end at some point, with some of the links uncrawled.

Have you enabled the "link view" for all the pages you want to crawl?



>It seems to me that for every log entry from SimpleCocoonCrawlerImpl that says "Add URL:
http://blah..." I should also have an entry from SimpleLuceneXMLIndexerImpl that says "Indexing
>The home page is crawled, and all of the pages off that page, and SOME of the pages off
those pages, and SOME of the pages off THOSE pages. I can't see why some pages are crawled
and others not. Perhaps the crawler simply stops at some point, and it hasn't finished its
list of URLs. But why would it stop crawling without logging any error? BTW, the last entry
in the log is always the SimpleLuceneXMLIndexerImpl reporting that it has indexed a page,
>DEBUG   (2003-06-09) 17:32.05:388   [] (/search/reindex.xml) HttpProcessor[80][4]/SimpleLuceneXMLIndexerImpl:
Indexing http://localhost:80/etexts/JCB-016/full.html?cocoon-view=content (text/xml)
>Does anyone have any ideas where I could start looking?
>I'm using the version RELEASE_2_1_M_2
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message