Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2A289721 for ; Wed, 25 Apr 2012 12:22:34 +0000 (UTC) Received: (qmail 61066 invoked by uid 500); 25 Apr 2012 12:22:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 61035 invoked by uid 500); 25 Apr 2012 12:22:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 61027 invoked by uid 99); 25 Apr 2012 12:22:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 12:22:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of evanchastelet@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-ee0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 12:22:25 +0000 Received: by eeke49 with SMTP id e49so280082eek.35 for ; Wed, 25 Apr 2012 05:22:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=YFUUnccKmpCTXZralekDeUNniIWnSPBgzyXzttv2juo=; b=Kh5ieT1cRYk4d3r6NaZfBRa/NVddsMLKB4iytB7GZjvwJvdFHSIuIgCSbzdEDsukAU W42bmlmd1xVoBot+ikHH0shgo8A6Q/8lQGhKtb33w5IIOvghsbks4oVaxTrVjnuHw5W4 8jkVlix2uTEVUw60Fjzj6JakpdwwAfTucIHriF32FlnxSYUYVEIhH6j59j00dgPiY8ZG Y1g3pky5GYsiPRoqhNoN9tDCc76lWFrFmhugCrx+nnfNvajN0X1/Mu/pF8TWiky+066a B+lDSSeb+WhKawri7hsbOKAh2cppL4ShwPKFoTsoeyzqKPL8l68j0v+mMSvMV2RL4RoR gcdA== Received: by 10.213.10.69 with SMTP id o5mr238621ebo.91.1335356523968; Wed, 25 Apr 2012 05:22:03 -0700 (PDT) Received: from [192.168.2.112] (g169027.upc-g.chello.nl. [80.57.169.27]) by mx.google.com with ESMTPS id x4sm102909237eef.10.2012.04.25.05.22.02 (version=SSLv3 cipher=OTHER); Wed, 25 Apr 2012 05:22:03 -0700 (PDT) Message-ID: <4F97EC69.8030003@gmail.com> Date: Wed, 25 Apr 2012 14:22:01 +0200 From: Elmer van Chastelet User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: PhoneticFilterFactory 's inject parameter References: <4F95AD2B.5090109@gmail.com> <4F97D102.3030709@gmail.com> In-Reply-To: <4F97D102.3030709@gmail.com> Content-Type: multipart/alternative; boundary="------------020505080109010806000005" X-Virus-Checked: Checked by ClamAV on apache.org --------------020505080109010806000005 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I keep replying to myself, it all gets a bit confusing. The problem still exists and I don't understand why, and why it worked once. I have the same behavior again as posted in my first mail: - Inject parameter is set to true. - The index has _no deleted documents_ and is optimized. - The term 'compete' is in there. - If I ask Luke to show all docs for term 'compete' it shows me the one and only document that represents this word. But... - If I perform the query 'value:compete' in luke again, it says there are no results. Here is the index I'm currently using. It contains various fields for the available phonetic filter encoders: https://www.box.com/s/34212e82227e102f6734 Can somebody explain this behavior? What's the real use of the inject parameter of the PhoneticFilterFactory? Thanks in advance. -Elmer On 04/25/2012 12:25 PM, Elmer van Chastelet wrote: > Problem solved. Long story short: for some reason I had deleted > documents in the index and the non-deleted documents used the phonetic > filter with inject set to false. > > Works fine now :) > > On 04/23/2012 09:27 PM, Elmer van Chastelet wrote: >> Hi all, >> >> (scroll to bottom for question) >> >> I was setting up a simple web app to play around with phonetic filters. >> The idea is simple, I just create a document for each word in the >> English dictionary, each document containing a single search field >> holding the value after it is preprocessed using the following >> analyzer def (in our own dsl syntax, which gets transformed to java): >> >> analyzer soundslike{ >> tokenizer = KeywordTokenizer >> tokenfilter = LowerCaseFilter >> tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", >> inject="true") >> } >> >> I can run the web app and I get results that indeed (in some way) >> sound like the original query term. >> >> But what confuses me is the ranking of the results, knowing that I >> set the inject param to true. If I search for the query term >> 'compete', the parsed query becomes '(value:KMPT value:compete)', and >> therefore I expect the word 'compete' to be ranked highest in the >> list than any other word.... but this wasn't the case. >> >> Looking further at the explanation of results, I saw that the term >> 'compete' in the parsed query is totally absent, and only the >> phonetic encoding seems affect the ranking: >> >> * COMPETITOR >> o 4.368826 = (MATCH) sum of: >> + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of: >> # 0.52838135 = queryWeight(value:KMPT), product of: >> * 8.26832 = idf(docFreq=150, maxDocs=216555) >> * 0.063904315 = queryNorm >> # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174), >> product of: >> * 1.0 = tf(termFreq(value:KMPT)=1) >> * 8.26832 = idf(docFreq=150, maxDocs=216555) >> * 1.0 = fieldNorm(field=value, doc=3174) >> >> The next thing I did was running our friend Luke. In Luke, I opened >> the documents tab, and started iterating over some terms for the >> field 'value' until I found 'compete'. When I hit 'Show All Docs', >> the search tab opens and it displays the one and only document >> holding this value (i.e. the document representing the word >> 'compete'). It shows the query: 'value:compete '. Then, when I hit >> the search button again (query is still 'value:compete '), it says >> that there are no results !? >> >> Probably, the 'Show All Docs' button does something different than >> performing a query using the search tab in Luke. >> >> Q: Can somebody explain why the injected original terms seem to get >> ignored at query time? Or may it be related to the name of the search >> field ('value'), or something else? >> >> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2). >> >> -Elmer >> >> > --------------020505080109010806000005--