lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <>
Subject Re: Setting Similarity in IndexWriter and IndexSearcher
Date Tue, 08 Jun 2004 21:46:44 GMT
I do these kind of things as part of a layer between Lucene and my application, but often have
thought it would be nice to have a metadata layer available that wasn't part of the Lucene
core, but was packaged w/ Lucene.  It could provide the information necessary and have tools
for updating with out messing w/ the index.

For instance, one of the things I store is the name of the field that is the "true" document
identifier (a unique String that won't change across Indexing) for that Document (which, as
Doug pointed out, can vary from Document to Document w/in the Index).


>>> 06/08/04 03:44PM >>>
David Spencer wrote:
> Does it ever make sense to set the Similartity obj in either (only one 
> of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can 
> I avoid setting it in IndexSearcher? Also, can I avoid setting it in 
> IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in 
> both places and was wondering about what's going on behind the scenes...

No, it probably doesn't make sense to use a different Similarity 
implementation when indexing than when searching.  Ideally perhaps we'd 
have a LuceneConfiguration object, which encapsulates the Similarity, 
Analysis and Directory implementations, as well as perhaps other 
parameters.  And perhaps this could even be stored with the index, using 
Java object serialization.  However I worry that this could cause more 
confusion than it solves.  For example, one might not easily be able to 
search and index if a class used when it was indexed is no longer 
available when searching.  Tools like Luke could become more difficult 
to write and use.

By design, one does not have to declare things up-front with Lucene. 
For example, one never has to declare the set of fields and their types. 
  Different documents in the same index can use different fields, or 
even use the same field name differently.  Saving analyzers and 
similarity implementations with the index reduces this sort of 
flexibility somewhat.  If you rename your analysis or similarity class, 
does your index become invalid?  Lucene currently avoids such issues, at 
the expense of potential confusion about using different analyzers and 
similarity at index and search time.  But I don't think the latter is in 
practice a problem that needs more than a little documentation.

Sorry for the long-winded answer!


To unsubscribe, e-mail: 
For additional commands, e-mail: 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message