cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira <upayav...@fwbo.org>
Subject Re: [RT] Lucene Configuration
Date Mon, 05 Jan 2004 15:46:32 GMT
Quick reply via a PDA...

I'd like to add to your list:
7) Ability to crawl a site using cocoon protocol rather than http. Thus an index could be
created as an offline process (e.g when the site is statically generated, and only the search
is dynamic - thus http cannot provide link view.)

Upayavira
Jeremy Quinn wrote:
> Hi All,
> 
> I had occasion to move an existing site that had Lucene integrated into 
> it, from a TomCat to a Jetty setup.
> 
> I noticed during this that while Lucene is a great search engine, it 
> can be very difficult to configure under certain circumstances, due to 
> some internal inconsistencies.
> 
> Here is a list of _some_ of the aspects that need configuring:
> 
> 1. The root directory where each Lucene index is stored
> 2. The actual Lucene index to use or create
> 3. The Analyzer to use for searching and creation
> 4. The set of patterns to exclude while crawling
> 5. The set of fields to store during index creation
> 6. The cocoon-views to use for content and link extraction
> 
> 
> 
> The first problem I came across is with (1) above, the 'index' 
> directory used by Lucene, defaults to Jetty's 'work' directory 
> '/private/tmp/Jetty__8888__/cocoon-files/' OMM, which gets cleaned out 
> each time Jetty is restarted (TomCat does not do this), meaning you 
> loose the indexes. So when you are using Jetty, you almost definitely 
> need to re-set this.
> 
> Two separate components need this parameter, the Searcher and the 
> Indexer. If you have multiple independently searchable sub-sites in one 
> Servlet, you would need all of them to use the same config, 
> differentiating between multiple indexes via param (2) above.
> 
> SimpleLuceneCocoonSearcherImpl reads an optional <directory/> parameter 
> from cocoon.xconf, but it has no effect, because the SearchGenerator 
> resets this during it's setup.
> 
> SimpleLuceneCocoonIndexerImpl does not pick up configuration from the 
> <directory/> parameter, even though it's name is declared as a static 
> variable. This parameter actually gets passed from create-index.xsp, so 
> you need to modify the indexer XSP to set the base location of the 
> indexes.
> 
> The only way it appears you can set a custom location for Lucene's 
> indexes for searching, is by putting an absolute path to them in the 
> SearchGenerator's <index/> parameter, in your SiteMap. ie in parameter 
> (2) above. This is not good IMHO.
> 
> 
> The next inconsistency is that the Analyzer classname (parameter (3) 
> above) can be set in cocoon.xconf on both the Searcher and the Indexer, 
> but again is overridden by SearchGenerator and create-index.xsp. While 
> I am not completely sure who needs to change the Analyzer or why, I 
> strongly suspect it could need to be different for each index in a 
> multi-index site. I do not think this is possible with the current 
> design.
> 
> 
> The next set of params (4) & (5) above, should not IMHO be global, if 
> again, you are setting up multiple sub-sites each with their own search 
> index, you would legitimately need separate settings for each of these 
> as the are likely to have different URLs and document structures etc..
> 
> 
> Param (6) above, is less clear-cut ..... would there be a genuine need 
> to have different settings for view-names for separate site-indexes?
> 
> 
> I do not have a proper proposal yet ..... I would like to discuss how 
> to best rationalise this situation, but have no wish to trample on 
> other people configuration needs ..... to start with, do you think my 
> analysis is correct?
> 
> 
> regards Jeremy
> 
> 
> 
> 


Mime
View raw message