Return-Path: X-Original-To: apmail-lucenenet-user-archive@www.apache.org Delivered-To: apmail-lucenenet-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3B16109F0 for ; Thu, 12 Dec 2013 16:43:33 +0000 (UTC) Received: (qmail 57319 invoked by uid 500); 12 Dec 2013 16:43:33 -0000 Delivered-To: apmail-lucenenet-user-archive@lucenenet.apache.org Received: (qmail 56931 invoked by uid 500); 12 Dec 2013 16:43:33 -0000 Mailing-List: contact user-help@lucenenet.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@lucenenet.apache.org Delivered-To: mailing list user@lucenenet.apache.org Received: (qmail 56918 invoked by uid 99); 12 Dec 2013 16:43:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Dec 2013 16:43:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [195.74.38.226] (HELO vsp-authed-03-02.binero.net) (195.74.38.226) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Dec 2013 16:43:25 +0000 Received: from smtp01.binero.se (unknown [195.74.38.28]) by vsp-authed-03-02.binero.net (Halon Mail Gateway) with ESMTPS for ; Thu, 12 Dec 2013 17:42:56 +0100 (CET) Received: from Computron.local (static-212.214.190.73.addr.tdcsong.se [212.214.190.73]) (Authenticated sender: sisve@devhost.se) by smtp-10-01.atm.binero.net (Postfix) with ESMTPSA id 8BF743A149 for ; Thu, 12 Dec 2013 17:42:56 +0100 (CET) Message-ID: <52A9E790.6000705@devhost.se> Date: Thu, 12 Dec 2013 17:42:56 +0100 From: Simon Svensson User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: user@lucenenet.apache.org Subject: Re: Getting fuzzy match information References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I feel that http://wiki.apache.org/lucene-java/ScoresAsPercentages is somewhat relevant, even if you're not strictly talking about normalizing your scores. I would look into calling indexSearcher.Explain(Query query, Int32 docId) to retrieve an Explanation object that _should_ container information how your document matched, including scores for every term. (The actual user interface for presenting this is left to the reader. Moahahahaa!) // Simon On 12/12/13 17:22, Allan, Brad (Bracknell) wrote: > Has anyone done or know of work done that would help me to get detailed information about my hits with regard to fuzzy matches? Also very happy to receive suggestions :). > > I'm looking to obtain the similarity percentage of each token in the each hit. > > Example: fuzzy query looks something like this: > (name:80% similar to "john" or name:80% similar to "henry" or name:80% similar to "smith") > And I get hits: > > * Jon George Smythe > > * John Joe Henry > > * Smith John & Carter engineering > All valid hits, however my users want to be able to view the similarity and indeed prioritise certain actions by being able to compare the results of 2 different searches (and therefore normalised scores are not as useful as knowing the actual similarity information). > > Clearly this sort of ability does not make sense when one is searching in large amounts of data (documents), but in my case I'm searching through a set of names and some additional person information. > > Options could be to post process the hits and use/lift the FuzzyTermEnum logic to re-compute the similarity value. Or perhaps extend the FuzzyQuery to register a 'listener' to receive the information? > Other ideas? Thoughts? > > > > ________________________________ > > CheckFree Solutions Limited (trading as Fiserv) > Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES > Registered in England: No. 2694333 >