lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: term position question from analyzer stack for WordDelimiterFilterFactory
Date Tue, 26 Apr 2011 22:46:18 GMT
I second Otis' comments. Is it possible that you've gotten twisted
around by trying to modify these settings and would be better off
going back to the WDDF settings in the example schema? I've
sometimes found that to be very useful.

Also (although I don't think it applies in this case) be aware that
the analysis page may introduce it's own errors, so when you see
something really wonky, try a query with &debugQuery=on and see
if the parsed query squares with the results on the analysis page...

 Best
Erick

On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen <robertpe@buy.com> wrote:
> Yeah I am about to try turning one on at a time and see what happens.  I
> had a meeting so couldn't do it yet...  (darn those meetings)  (lol)
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, April 26, 2011 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: term position question from analyzer stack for
> WordDelimiterFilterFactory
>
> Hi Robert,
>
> I'm no WDFF expert, but all these zero look suspicious:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
> generateNumberParts=0, catenateWords=0, generateWordParts=0,
> catenateAll=0, catenateNumbers=0}
>
> A quick visit to
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
> imiterFilterFactory
>  makes me think you want:
>
> splitOnCaseChange=1  (if you want Mc Afee for some reason?)
> generateWordParts=1 (if you want Mc Afee for some reason?)
> preserveOriginal=1
>
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: Robert Petersen <robertpe@buy.com>
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Sent: Tue, April 26, 2011 4:39:49 PM
>> Subject: RE: term position question from analyzer stack for
>>WordDelimiterFilterFactory
>>
>> OK this is even more weird... everything is working much better except
>> for  one thing: I was testing use cases with our top query terms to
> make
>> sure the  below query settings wouldn't break any existing behavior,
> and
>> got this most  unusual result.  The analyzer stack completely
> eliminated
>> the word  McAfee from the query terms!  I'm like huh?  Here is the
>> analyzer  page output for that search term:
>>
>> Query  Analyzer
>> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>> term  position     1
>> term text     McAfee
>> term  type     word
>> source start,end      0,6
>> payload
>> org.apache.solr.analysis.SynonymFilterFactory
>> {synonyms=query_synonyms.txt,  expand=true, ignoreCase=true}
>> term position     1
>> term  text     McAfee
>> term type     word
>> source  start,end     0,6
>> payload
>> org.apache.solr.analysis.StopFilterFactory  {words=stopwords.txt,
>> ignoreCase=true}
>> term position      1
>> term text     McAfee
>> term type      word
>> source start,end     0,6
>> payload
>> org.apache.solr.analysis.WordDelimiterFilterFactory
> {preserveOriginal=0,
>> generateNumberParts=0, catenateWords=0,  generateWordParts=0,
>> catenateAll=0, catenateNumbers=0}
>> term  position
>> term text
>> term type
>> source  start,end
>> payload
>> org.apache.solr.analysis.LowerCaseFilterFactory  {}
>> term position
>> term text
>> term type
>> source  start,end
>> payload
>> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
>> {protected=protwords.txt}
>> term  position
>> term text
>> term type
>> source  start,end
>> payload
>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory  {}
>> term position
>> term text
>> term type
>> source  start,end
>> payload
>>
>>
>>
>> -----Original Message-----
>> From: Robert  Petersen [mailto:robertpe@buy.com]
>> Sent: Monday, April 25,  2011 11:27 AM
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Subject:  RE: term position question from analyzer stack  for
>> WordDelimiterFilterFactory
>>
>> Aha!  I knew something must be  awry, but when I looked at the
> analysis
>> page output, well it sure looked like  it should match.  :)
>>
>> OK here is the query side WDF that finally  works, I just turned
>> everything off.  (yay)  First I tried just  completely removeing WDF
> from
>> the query side analyzer stack but that didn't  work.  So anyway I
> suppose
>> I should turn off the catenate all plus the  preserve original
> settings,
>> reindex, and see if I still get a match  huh?  (PS  thank you very
> much
>> for the help!!!)
>>
>>            <filter  class="solr.WordDelimiterFilterFactory"
>>                  generateWordParts="0"
>>                  generateNumberParts="0"
>>                  catenateWords="0"
>>                  catenateNumbers="0"
>>                  catenateAll="0"
>>                  preserveOriginal="0"
>>                  />
>>
>>
>>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of  Yonik
>> Seeley
>> Sent: Monday, April 25, 2011 9:24 AM
>> To: solr-user@lucene.apache.org
>> Subject:  Re: term position question from analyzer stack  for
>> WordDelimiterFilterFactory
>>
>> On Mon, Apr 25, 2011 at 12:15 PM,  Robert Petersen <robertpe@buy.com>
>> wrote:
>> > The  search and index analyzer stack are the same.
>>
>> Ahhh, they should not  be!
>> Using both generate and catenate in WDF at query time is a no-no.
>> Same  reason you can't have multi-word synonyms at query time:
>>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
>> FilterFactory
>>
>> I'd  recommend going back to the WDF settings in the solr example
>> server as a  starting point.
>>
>>
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User  Conference, May
>> 25-26, San Francisco
>>
>

Mime
View raw message