cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Unico Hommes" <>
Subject RE: [RT] Lucene Configuration
Date Tue, 06 Jan 2004 10:40:10 GMT

Upayavira wrote:
> Unico Hommes wrote:
> >Upayavira wrote:
> >  
> >
> >>Quick reply via a PDA...
> >>
> >>I'd like to add to your list:
> >>7) Ability to crawl a site using cocoon protocol rather than http. 
> >>Thus an index could be created as an offline process (e.g when the 
> >>site is statically generated, and only the search is dynamic - thus 
> >>http cannot provide link view.)
> >>
> >>    
> >>
> >
> >I already wrote this. One of the things that needs to be 
> done is change 
> >the Crawler interface to take Strings instead of URLs though.
> >
> I started doing it. And that's what I saw. So it involves 
> changing published interface. So the question is, do we just 
> change the interface, or do we extend it to have a String 
> version too, and make the URL versions purely wrappers around 
> the String ones? What do the code guardians out there say?

Nah, URL is old. Alternatively, we could also use Sources if that maps
more closely. It may also save time to return resolved Sources instead
of Strings if clients need access to the input stream later on.

> I'd also like to hear how you've done this. Do you still 
> request the 'create index' page within a servlet? Or can you 
> generate offline, say with the CocoonBean?

I've not used it with Lucene at all. It's part of our publisher project.

> >If noone
> >has any objections and I find some time I could contribute my code.
> >  
> >
> I would be delighted. I think we just need to clarify the 
> question if interface change.
> >Btw. AFAICS the CocoonCrawler component is only used by Lucene block.
> >Shall we move it there?
> >  
> >
> I think so. It seems strange (and misleading) having it in the core.

OK, great. I'll try to find some time this week to work on it.


View raw message