Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 966C61A4A for ; Tue, 26 Apr 2011 20:40:20 +0000 (UTC) Received: (qmail 655 invoked by uid 500); 26 Apr 2011 20:40:17 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 590 invoked by uid 500); 26 Apr 2011 20:40:17 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 581 invoked by uid 99); 26 Apr 2011 20:40:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 20:40:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robertpe@buy.com designates 209.67.181.93 as permitted sender) Received: from [209.67.181.93] (HELO OutboundSMTP01.buycorp.buyservices.com) (209.67.181.93) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 20:40:11 +0000 Received: from email01.buy.com ([10.10.0.183]) by OutboundSMTP01.buycorp.buyservices.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Apr 2011 13:39:49 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Date: Tue, 26 Apr 2011 13:39:49 -0700 Message-ID: <11B8ADF3AAA0A84C89B1A42CADF918D30793D60E@email01.buy.com> In-Reply-To: <11B8ADF3AAA0A84C89B1A42CADF918D30793D5F8@email01.buy.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: term position question from analyzer stack for WordDelimiterFilterFactory Thread-Index: AcwDZU7348eyEIaNQdmMEHLvcHRoIQADYxGgADeyZ/A= References: <11B8ADF3AAA0A84C89B1A42CADF918D30793D5CB@email01.buy.com><11B8ADF3AAA0A84C89B1A42CADF918D30793D5D2@email01.buy.com><1303376070502-2846336.post@n3.nabble.com><11B8ADF3AAA0A84C89B1A42CADF918D30793D5E8@email01.buy.com><11B8ADF3AAA0A84C89B1A42CADF918D30793D5F4@email01.buy.com><11B8ADF3AAA0A84C89B1A42CADF918D30793D5F6@email01.buy.com> <11B8ADF3AAA0A84C89B1A42CADF918D30793D5F8@email01.buy.com> From: "Robert Petersen" To: , X-OriginalArrivalTime: 26 Apr 2011 20:39:49.0517 (UTC) FILETIME=[16517BD0:01CC0452] X-Virus-Checked: Checked by ClamAV on apache.org OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McAfee from the query terms! I'm like huh? Here is the analyzer page output for that search term: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text McAfee term type word source start,end 0,6 payload =09 org.apache.solr.analysis.SynonymFilterFactory {synonyms=3Dquery_synonyms.txt, expand=3Dtrue, ignoreCase=3Dtrue} term position 1 term text McAfee term type word source start,end 0,6 payload =09 org.apache.solr.analysis.StopFilterFactory {words=3Dstopwords.txt, ignoreCase=3Dtrue} term position 1 term text McAfee term type word source start,end 0,6 payload =09 org.apache.solr.analysis.WordDelimiterFilterFactory = {preserveOriginal=3D0, generateNumberParts=3D0, catenateWords=3D0, generateWordParts=3D0, catenateAll=3D0, catenateNumbers=3D0} term position term text term type source start,end payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position term text term type source start,end payload com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory {protected=3Dprotwords.txt} term position term text term type source start,end payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position term text term type source start,end payload -----Original Message----- From: Robert Petersen [mailto:robertpe@buy.com]=20 Sent: Monday, April 25, 2011 11:27 AM To: solr-user@lucene.apache.org; yonik@lucidimagination.com Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) =09 -----Original Message----- From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco