cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Quinn <jer...@media.demon.co.uk>
Subject [RT] Lucene Configuration
Date Mon, 05 Jan 2004 12:20:38 GMT
Hi All,

I had occasion to move an existing site that had Lucene integrated into 
it, from a TomCat to a Jetty setup.

I noticed during this that while Lucene is a great search engine, it 
can be very difficult to configure under certain circumstances, due to 
some internal inconsistencies.

Here is a list of _some_ of the aspects that need configuring:

1. The root directory where each Lucene index is stored
2. The actual Lucene index to use or create
3. The Analyzer to use for searching and creation
4. The set of patterns to exclude while crawling
5. The set of fields to store during index creation
6. The cocoon-views to use for content and link extraction



The first problem I came across is with (1) above, the 'index' 
directory used by Lucene, defaults to Jetty's 'work' directory 
'/private/tmp/Jetty__8888__/cocoon-files/' OMM, which gets cleaned out 
each time Jetty is restarted (TomCat does not do this), meaning you 
loose the indexes. So when you are using Jetty, you almost definitely 
need to re-set this.

Two separate components need this parameter, the Searcher and the 
Indexer. If you have multiple independently searchable sub-sites in one 
Servlet, you would need all of them to use the same config, 
differentiating between multiple indexes via param (2) above.

SimpleLuceneCocoonSearcherImpl reads an optional <directory/> parameter 
from cocoon.xconf, but it has no effect, because the SearchGenerator 
resets this during it's setup.

SimpleLuceneCocoonIndexerImpl does not pick up configuration from the 
<directory/> parameter, even though it's name is declared as a static 
variable. This parameter actually gets passed from create-index.xsp, so 
you need to modify the indexer XSP to set the base location of the 
indexes.

The only way it appears you can set a custom location for Lucene's 
indexes for searching, is by putting an absolute path to them in the 
SearchGenerator's <index/> parameter, in your SiteMap. ie in parameter 
(2) above. This is not good IMHO.


The next inconsistency is that the Analyzer classname (parameter (3) 
above) can be set in cocoon.xconf on both the Searcher and the Indexer, 
but again is overridden by SearchGenerator and create-index.xsp. While 
I am not completely sure who needs to change the Analyzer or why, I 
strongly suspect it could need to be different for each index in a 
multi-index site. I do not think this is possible with the current 
design.


The next set of params (4) & (5) above, should not IMHO be global, if 
again, you are setting up multiple sub-sites each with their own search 
index, you would legitimately need separate settings for each of these 
as the are likely to have different URLs and document structures etc..


Param (6) above, is less clear-cut ..... would there be a genuine need 
to have different settings for view-names for separate site-indexes?


I do not have a proper proposal yet ..... I would like to discuss how 
to best rationalise this situation, but have no wish to trample on 
other people configuration needs ..... to start with, do you think my 
analysis is correct?


regards Jeremy



Mime
View raw message