lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Setting Similarity in IndexWriter and IndexSearcher
Date Tue, 08 Jun 2004 19:44:47 GMT
David Spencer wrote:
> Does it ever make sense to set the Similartity obj in either (only one 
> of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can 
> I avoid setting it in IndexSearcher? Also, can I avoid setting it in 
> IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in 
> both places and was wondering about what's going on behind the scenes...

No, it probably doesn't make sense to use a different Similarity 
implementation when indexing than when searching.  Ideally perhaps we'd 
have a LuceneConfiguration object, which encapsulates the Similarity, 
Analysis and Directory implementations, as well as perhaps other 
parameters.  And perhaps this could even be stored with the index, using 
Java object serialization.  However I worry that this could cause more 
confusion than it solves.  For example, one might not easily be able to 
search and index if a class used when it was indexed is no longer 
available when searching.  Tools like Luke could become more difficult 
to write and use.

By design, one does not have to declare things up-front with Lucene. 
For example, one never has to declare the set of fields and their types. 
  Different documents in the same index can use different fields, or 
even use the same field name differently.  Saving analyzers and 
similarity implementations with the index reduces this sort of 
flexibility somewhat.  If you rename your analysis or similarity class, 
does your index become invalid?  Lucene currently avoids such issues, at 
the expense of potential confusion about using different analyzers and 
similarity at index and search time.  But I don't think the latter is in 
practice a problem that needs more than a little documentation.

Sorry for the long-winded answer!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message