lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Petersen" <rober...@buy.com>
Subject RE: term position question from analyzer stack for WordDelimiterFilterFactory
Date Tue, 26 Apr 2011 20:39:49 GMT
OK this is even more weird... everything is working much better except
for one thing: I was testing use cases with our top query terms to make
sure the below query settings wouldn't break any existing behavior, and
got this most unusual result.  The analyzer stack completely eliminated
the word McAfee from the query terms!  I'm like huh?  Here is the
analyzer page output for that search term:

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=query_synonyms.txt, expand=true, ignoreCase=true}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 	1
term text 	McAfee
term type 	word
source start,end 	0,6
payload 	
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
generateNumberParts=0, catenateWords=0, generateWordParts=0,
catenateAll=0, catenateNumbers=0}
term position
term text
term type
source start,end
payload
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position
term text
term type
source start,end
payload
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
{protected=protwords.txt}
term position
term text
term type
source start,end
payload
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position
term text
term type
source start,end
payload



-----Original Message-----
From: Robert Petersen [mailto:robertpe@buy.com] 
Sent: Monday, April 25, 2011 11:27 AM
To: solr-user@lucene.apache.org; yonik@lucidimagination.com
Subject: RE: term position question from analyzer stack for
WordDelimiterFilterFactory

Aha!  I knew something must be awry, but when I looked at the analysis
page output, well it sure looked like it should match.  :)

OK here is the query side WDF that finally works, I just turned
everything off.  (yay)  First I tried just completely removeing WDF from
the query side analyzer stack but that didn't work.  So anyway I suppose
I should turn off the catenate all plus the preserve original settings,
reindex, and see if I still get a match huh?  (PS  thank you very much
for the help!!!)

          <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="0"
                />	



-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Monday, April 25, 2011 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen <robertpe@buy.com>
wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
FilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Mime
View raw message