Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com
 designates 209.85.216.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=O6Bu1exZG2U2iep8OLnu8fHGSTz4un32G1xjb52rVVjHqDiqNVWcntMoBYtuQj/eMt
         MVCy1MJxYINlomuPgdoMmxrClpCj0HXymz9m0nsWuMDKIDHwYzjYU/eQ2DYM60nDRBIT
         /I3umVnRBQhpxW+WG33fDLpdv5E5NDKeFhvEI=
MIME-Version: 1.0
In-Reply-To: <11B8ADF3AAA0A84C89B1A42CADF918D30793D610@email01.buy.com>
References: <11B8ADF3AAA0A84C89B1A42CADF918D30793D5CB@email01.buy.com>
	<BANLkTinNGQQPELZtDVxwENmrVojH0wcMaA@mail.gmail.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D5D2@email01.buy.com>
	<1303376070502-2846336.post@n3.nabble.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D5E8@email01.buy.com>
	<BANLkTimpq1h_yesE9OBx1kvy1R3UV_NW8g@mail.gmail.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D5F4@email01.buy.com>
	<BANLkTi=WBQEvi=yucxDLFWXREtx6X6be_A@mail.gmail.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D5F6@email01.buy.com>
	<BANLkTimKDw_jOoJJtscp-cfoGb-35fVXDw@mail.gmail.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D5F8@email01.buy.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D60E@email01.buy.com>
	<618169.78572.qm@web130108.mail.mud.yahoo.com>
	<11B8ADF3AAA0A84C89B1A42CADF918D30793D610@email01.buy.com>
Date: Tue, 26 Apr 2011 18:46:18 -0400
Message-ID: <BANLkTimQ=ow_vZc4oSbTxohz2k86gfin6Q@mail.gmail.com>
Subject: Re: term position question from analyzer stack for
 WordDelimiterFilterFactory
From: Erick Erickson <erickerickson@gmail.com>
To: solr-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I second Otis' comments. Is it possible that you've gotten twisted
around by trying to modify these settings and would be better off
going back to the WDDF settings in the example schema? I've
sometimes found that to be very useful.

Also (although I don't think it applies in this case) be aware that
the analysis page may introduce it's own errors, so when you see
something really wonky, try a query with &debugQuery=3Don and see
if the parsed query squares with the results on the analysis page...

 Best
Erick

On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen <robertpe@buy.com> wrote:
> Yeah I am about to try turning one on at a time and see what happens. =A0=
I
> had a meeting so couldn't do it yet... =A0(darn those meetings) =A0(lol)
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, April 26, 2011 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: term position question from analyzer stack for
> WordDelimiterFilterFactory
>
> Hi Robert,
>
> I'm no WDFF expert, but all these zero look suspicious:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=3D0=
,
> generateNumberParts=3D0, catenateWords=3D0, generateWordParts=3D0,
> catenateAll=3D0, catenateNumbers=3D0}
>
> A quick visit to
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
> imiterFilterFactory
> =A0makes me think you want:
>
> splitOnCaseChange=3D1 =A0(if you want Mc Afee for some reason?)
> generateWordParts=3D1 (if you want Mc Afee for some reason?)
> preserveOriginal=3D1
>
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: Robert Petersen <robertpe@buy.com>
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Sent: Tue, April 26, 2011 4:39:49 PM
>> Subject: RE: term position question from analyzer stack for
>>WordDelimiterFilterFactory
>>
>> OK this is even more weird... everything is working much better except
>> for =A0one thing: I was testing use cases with our top query terms to
> make
>> sure the =A0below query settings wouldn't break any existing behavior,
> and
>> got this most =A0unusual result. =A0The analyzer stack completely
> eliminated
>> the word =A0McAfee from the query terms! =A0I'm like huh? =A0Here is the
>> analyzer =A0page output for that search term:
>>
>> Query =A0Analyzer
>> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>> term =A0position =A0 =A0 1
>> term text =A0 =A0 McAfee
>> term =A0type =A0 =A0 word
>> source start,end =A0 =A0 =A00,6
>> payload
>> org.apache.solr.analysis.SynonymFilterFactory
>> {synonyms=3Dquery_synonyms.txt, =A0expand=3Dtrue, ignoreCase=3Dtrue}
>> term position =A0 =A0 1
>> term =A0text =A0 =A0 McAfee
>> term type =A0 =A0 word
>> source =A0start,end =A0 =A0 0,6
>> payload
>> org.apache.solr.analysis.StopFilterFactory =A0{words=3Dstopwords.txt,
>> ignoreCase=3Dtrue}
>> term position =A0 =A0 =A01
>> term text =A0 =A0 McAfee
>> term type =A0 =A0 =A0word
>> source start,end =A0 =A0 0,6
>> payload
>> org.apache.solr.analysis.WordDelimiterFilterFactory
> {preserveOriginal=3D0,
>> generateNumberParts=3D0, catenateWords=3D0, =A0generateWordParts=3D0,
>> catenateAll=3D0, catenateNumbers=3D0}
>> term =A0position
>> term text
>> term type
>> source =A0start,end
>> payload
>> org.apache.solr.analysis.LowerCaseFilterFactory =A0{}
>> term position
>> term text
>> term type
>> source =A0start,end
>> payload
>> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
>> {protected=3Dprotwords.txt}
>> term =A0position
>> term text
>> term type
>> source =A0start,end
>> payload
>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory =A0{}
>> term position
>> term text
>> term type
>> source =A0start,end
>> payload
>>
>>
>>
>> -----Original Message-----
>> From: Robert =A0Petersen [mailto:robertpe@buy.com]
>> Sent: Monday, April 25, =A02011 11:27 AM
>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
>> Subject: =A0RE: term position question from analyzer stack =A0for
>> WordDelimiterFilterFactory
>>
>> Aha! =A0I knew something must be =A0awry, but when I looked at the
> analysis
>> page output, well it sure looked like =A0it should match. =A0:)
>>
>> OK here is the query side WDF that finally =A0works, I just turned
>> everything off. =A0(yay) =A0First I tried just =A0completely removeing W=
DF
> from
>> the query side analyzer stack but that didn't =A0work. =A0So anyway I
> suppose
>> I should turn off the catenate all plus the =A0preserve original
> settings,
>> reindex, and see if I still get a match =A0huh? =A0(PS =A0thank you very
> much
>> for the help!!!)
>>
>> =A0 =A0 =A0 =A0 =A0 =A0<filter =A0class=3D"solr.WordDelimiterFilterFacto=
ry"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0generateWordParts=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0generateNumberParts=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateWords=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateNumbers=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateAll=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0preserveOriginal=3D"0"
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/>
>>
>>
>>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of =A0Yonik
>> Seeley
>> Sent: Monday, April 25, 2011 9:24 AM
>> To: solr-user@lucene.apache.org
>> Subject: =A0Re: term position question from analyzer stack =A0for
>> WordDelimiterFilterFactory
>>
>> On Mon, Apr 25, 2011 at 12:15 PM, =A0Robert Petersen <robertpe@buy.com>
>> wrote:
>> > The =A0search and index analyzer stack are the same.
>>
>> Ahhh, they should not =A0be!
>> Using both generate and catenate in WDF at query time is a no-no.
>> Same =A0reason you can't have multi-word synonyms at query time:
>>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
>> FilterFactory
>>
>> I'd =A0recommend going back to the WDF settings in the solr example
>> server as a =A0starting point.
>>
>>
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User =A0Conference, May
>> 25-26, San Francisco
>>
>