lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "AnalyzersTokenizersTokenFilters" by ShalinMangar
Date Wed, 27 Jan 2010 11:51:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by ShalinMangar.
The comment on this change is: Fixed typo "seabisuit".
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=71&rev2=72

--------------------------------------------------

  Keep in mind that while the !SynonymFilter will happily work with synonyms containing multiple
words (ie: "`sea biscuit, sea biscit, seabiscuit`") The recommended approach for dealing with
synonyms like this, is to expand the synonym when indexing.  This is because there are two
potential issues that can arrise at query time:
  
   1. The Lucene !QueryParser tokenizes on white space before giving any text to the Analyzer,
so if a person searches for the words `sea biscit` the analyzer will be given the words "sea"
and "biscit" seperately, and will not know that they match a synonym.
-  1. Phrase searching (ie: `"sea biscit"`) will cause the !QueryParser to pass the entire
string to the analyzer, but if the !SynonymFilter is configured to expand the synonyms, then
when the !QueryParser gets the resulting list of tokens back from the Analyzer, it will construct
a !MultiPhraseQuery that will not have the desired effect.  This is because of the limited
mechanism available for the Analyzer to indicate that two terms occupy the same position:
there is no way to indicate that a "phrase" occupies the same position as a term.  For our
example the resulting !MultiPhraseQuery would be `"(sea | sea | seabiscuit) (biscuit | biscit)"`
which would not match the simple case of "seabisuit" occuring in a document
+  1. Phrase searching (ie: `"sea biscit"`) will cause the !QueryParser to pass the entire
string to the analyzer, but if the !SynonymFilter is configured to expand the synonyms, then
when the !QueryParser gets the resulting list of tokens back from the Analyzer, it will construct
a !MultiPhraseQuery that will not have the desired effect.  This is because of the limited
mechanism available for the Analyzer to indicate that two terms occupy the same position:
there is no way to indicate that a "phrase" occupies the same position as a term.  For our
example the resulting !MultiPhraseQuery would be `"(sea | sea | seabiscuit) (biscuit | biscit)"`
which would not match the simple case of "seabiscuit" occuring in a document
  
  Even when you aren't worried about multi-word synonyms, idf differences still make index
time synonyms a good idea. Consider the following scenario:
  

Mime
View raw message