cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vadim Gritsenko" <>
Subject RE: Crawler/Indexer redesign
Date Sat, 02 Feb 2002 19:27:26 GMT
> From: Bernhard Huber []
>   hi,
> As I'm not totally happy with the Crawler, Indexer component
> I want to address issues here:
> Today CocoonCrawler exposes:
>  void crawl(URL), and Iterator iterator();
> crawl sets the base url, and iterator() delivers one more URL
> from the base url.
> I have some head-aches using URL objects in the commandline
> The only simple possibility is to use file: URLs which implicits
> the xml document which has been crawled to the filesystem. But storing
> it to the filesystem I want to avoid for sake of performance.
> Thus I was thinking changing the interface to:
> void crawl(Source) , and Iterator iterator();
> Thus working with Source objects instead of URL objects.

How about 

  Collection crawl(Source)

? Then crawler can be ThreadSafe.


> The LuceneCocoonIndexer should also change from using URL to using
> The main reason for this change is implementing crawling and indexing
> today works only using the http: protocol.
> If you want to index xml documents of the local cocoon, or if you want
> to create an index in the command line version of Cocoon, you may not
> able to use the http protocol.
> Thus I was thinking about using Source.
> Perhaps someone having a broader, and more detailed understanding of
> Cocoon internas could help me a bit.
> bye bernhard

To unsubscribe, e-mail:
For additional commands, email:

View raw message