cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vadim Gritsenko" <vadim.gritse...@verizon.net>
Subject RE: Crawler/Indexer redesign
Date Sat, 02 Feb 2002 19:27:26 GMT
> From: Bernhard Huber [mailto:berni_huber@a1.net]
> 
>   hi,
> 
> As I'm not totally happy with the Crawler, Indexer component
interfaces
> I want to address issues here:
> 
> Today CocoonCrawler exposes:
>  void crawl(URL), and Iterator iterator();
> crawl sets the base url, and iterator() delivers one more URL
reachable
> from the base url.
> I have some head-aches using URL objects in the commandline
environment.
> The only simple possibility is to use file: URLs which implicits
storing
> the xml document which has been crawled to the filesystem. But storing
> it to the filesystem I want to avoid for sake of performance.
> 
> Thus I was thinking changing the interface to:
> void crawl(Source) , and Iterator iterator();
> Thus working with Source objects instead of URL objects.

How about 

  Collection crawl(Source)

? Then crawler can be ThreadSafe.


Vadim

 
> The LuceneCocoonIndexer should also change from using URL to using
Source.
> 
> The main reason for this change is implementing crawling and indexing
> today works only using the http: protocol.
> If you want to index xml documents of the local cocoon, or if you want
> to create an index in the command line version of Cocoon, you may not
be
> able to use the http protocol.
> Thus I was thinking about using Source.
> 
> Perhaps someone having a broader, and more detailed understanding of
the
> Cocoon internas could help me a bit.
> 
> bye bernhard


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message