Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5CBF1A5D for ; Tue, 26 Apr 2011 22:46:47 +0000 (UTC) Received: (qmail 4090 invoked by uid 500); 26 Apr 2011 22:46:44 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 4043 invoked by uid 500); 26 Apr 2011 22:46:44 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 4034 invoked by uid 99); 26 Apr 2011 22:46:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 22:46:44 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 22:46:40 +0000 Received: by qwj9 with SMTP id 9so723708qwj.35 for ; Tue, 26 Apr 2011 15:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=ke6eAv17F+iOEqq2k06/+CXp83YrH4PWLaxlA1UdSEU=; b=srNT3/GeLyLNSfLypAnKkeIwNS3pLr7NdtsNplao4eyoJqZJwlQqaRbKtTGvtvazDO lIEk9RVD711PZtiySSZd5F54tyANe1/Qj1ltn8KDgEA6SF9oOae0m12X39xBf9H2Nlxc DHRdYATzihpMPX3G6EfZQkjY9e2lsiJtGzalU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=O6Bu1exZG2U2iep8OLnu8fHGSTz4un32G1xjb52rVVjHqDiqNVWcntMoBYtuQj/eMt MVCy1MJxYINlomuPgdoMmxrClpCj0HXymz9m0nsWuMDKIDHwYzjYU/eQ2DYM60nDRBIT /I3umVnRBQhpxW+WG33fDLpdv5E5NDKeFhvEI= MIME-Version: 1.0 Received: by 10.229.5.209 with SMTP id 17mr1065952qcw.85.1303857978065; Tue, 26 Apr 2011 15:46:18 -0700 (PDT) Received: by 10.229.245.81 with HTTP; Tue, 26 Apr 2011 15:46:18 -0700 (PDT) In-Reply-To: <11B8ADF3AAA0A84C89B1A42CADF918D30793D610@email01.buy.com> References: <11B8ADF3AAA0A84C89B1A42CADF918D30793D5CB@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5D2@email01.buy.com> <1303376070502-2846336.post@n3.nabble.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5E8@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5F4@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5F6@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5F8@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D60E@email01.buy.com> <618169.78572.qm@web130108.mail.mud.yahoo.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D610@email01.buy.com> Date: Tue, 26 Apr 2011 18:46:18 -0400 Message-ID: Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I second Otis' comments. Is it possible that you've gotten twisted around by trying to modify these settings and would be better off going back to the WDDF settings in the example schema? I've sometimes found that to be very useful. Also (although I don't think it applies in this case) be aware that the analysis page may introduce it's own errors, so when you see something really wonky, try a query with &debugQuery=3Don and see if the parsed query squares with the results on the analysis page... Best Erick On Tue, Apr 26, 2011 at 5:44 PM, Robert Petersen wrote: > Yeah I am about to try turning one on at a time and see what happens. =A0= I > had a meeting so couldn't do it yet... =A0(darn those meetings) =A0(lol) > > > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] > Sent: Tuesday, April 26, 2011 2:37 PM > To: solr-user@lucene.apache.org > Subject: Re: term position question from analyzer stack for > WordDelimiterFilterFactory > > Hi Robert, > > I'm no WDFF expert, but all these zero look suspicious: > > org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=3D0= , > generateNumberParts=3D0, catenateWords=3D0, generateWordParts=3D0, > catenateAll=3D0, catenateNumbers=3D0} > > A quick visit to > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel > imiterFilterFactory > =A0makes me think you want: > > splitOnCaseChange=3D1 =A0(if you want Mc Afee for some reason?) > generateWordParts=3D1 (if you want Mc Afee for some reason?) > preserveOriginal=3D1 > > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- >> From: Robert Petersen >> To: solr-user@lucene.apache.org; yonik@lucidimagination.com >> Sent: Tue, April 26, 2011 4:39:49 PM >> Subject: RE: term position question from analyzer stack for >>WordDelimiterFilterFactory >> >> OK this is even more weird... everything is working much better except >> for =A0one thing: I was testing use cases with our top query terms to > make >> sure the =A0below query settings wouldn't break any existing behavior, > and >> got this most =A0unusual result. =A0The analyzer stack completely > eliminated >> the word =A0McAfee from the query terms! =A0I'm like huh? =A0Here is the >> analyzer =A0page output for that search term: >> >> Query =A0Analyzer >> org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term =A0position =A0 =A0 1 >> term text =A0 =A0 McAfee >> term =A0type =A0 =A0 word >> source start,end =A0 =A0 =A00,6 >> payload >> org.apache.solr.analysis.SynonymFilterFactory >> {synonyms=3Dquery_synonyms.txt, =A0expand=3Dtrue, ignoreCase=3Dtrue} >> term position =A0 =A0 1 >> term =A0text =A0 =A0 McAfee >> term type =A0 =A0 word >> source =A0start,end =A0 =A0 0,6 >> payload >> org.apache.solr.analysis.StopFilterFactory =A0{words=3Dstopwords.txt, >> ignoreCase=3Dtrue} >> term position =A0 =A0 =A01 >> term text =A0 =A0 McAfee >> term type =A0 =A0 =A0word >> source start,end =A0 =A0 0,6 >> payload >> org.apache.solr.analysis.WordDelimiterFilterFactory > {preserveOriginal=3D0, >> generateNumberParts=3D0, catenateWords=3D0, =A0generateWordParts=3D0, >> catenateAll=3D0, catenateNumbers=3D0} >> term =A0position >> term text >> term type >> source =A0start,end >> payload >> org.apache.solr.analysis.LowerCaseFilterFactory =A0{} >> term position >> term text >> term type >> source =A0start,end >> payload >> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory >> {protected=3Dprotwords.txt} >> term =A0position >> term text >> term type >> source =A0start,end >> payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory =A0{} >> term position >> term text >> term type >> source =A0start,end >> payload >> >> >> >> -----Original Message----- >> From: Robert =A0Petersen [mailto:robertpe@buy.com] >> Sent: Monday, April 25, =A02011 11:27 AM >> To: solr-user@lucene.apache.org; yonik@lucidimagination.com >> Subject: =A0RE: term position question from analyzer stack =A0for >> WordDelimiterFilterFactory >> >> Aha! =A0I knew something must be =A0awry, but when I looked at the > analysis >> page output, well it sure looked like =A0it should match. =A0:) >> >> OK here is the query side WDF that finally =A0works, I just turned >> everything off. =A0(yay) =A0First I tried just =A0completely removeing W= DF > from >> the query side analyzer stack but that didn't =A0work. =A0So anyway I > suppose >> I should turn off the catenate all plus the =A0preserve original > settings, >> reindex, and see if I still get a match =A0huh? =A0(PS =A0thank you very > much >> for the help!!!) >> >> =A0 =A0 =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0generateWordParts=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0generateNumberParts=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateWords=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateNumbers=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0catenateAll=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0preserveOriginal=3D"0" >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/> >> >> >> >> -----Original Message----- >> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of =A0Yonik >> Seeley >> Sent: Monday, April 25, 2011 9:24 AM >> To: solr-user@lucene.apache.org >> Subject: =A0Re: term position question from analyzer stack =A0for >> WordDelimiterFilterFactory >> >> On Mon, Apr 25, 2011 at 12:15 PM, =A0Robert Petersen >> wrote: >> > The =A0search and index analyzer stack are the same. >> >> Ahhh, they should not =A0be! >> Using both generate and catenate in WDF at query time is a no-no. >> Same =A0reason you can't have multi-word synonyms at query time: >> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym >> FilterFactory >> >> I'd =A0recommend going back to the WDF settings in the solr example >> server as a =A0starting point. >> >> >> -Yonik >> http://www.lucenerevolution.org -- Lucene/Solr User =A0Conference, May >> 25-26, San Francisco >> >