lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radha Sreedharan <>
Subject Proximity and Percentage match search in Lucene
Date Sun, 19 Apr 2009 12:22:27 GMT
 What I need is  the following :
If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).

 Given the following :
 I should get a hit even if all of the search tokens aren't present
 If the tokens are found they should be found within a distance x of
each other ( proximity
> I need the percentage match of the search tokens with the document field.
> Currently this is my query :
> 1) I form all possible permutation of the search tokens
> 2) do a spanNearQuery of each permutation
> 3)  Do a DisjunctionMaxQuery on the spannearqueries.
> This is how I compute % match  :
> % match =  ( Score by running the query on the document field ) /
> 		( score by running the query on a document field created out of search tokens )
> The numerator gives me the actual score with the search tokens run on the field.
> Denominator gives  me the  best possible or maximum possible score with the current search
> For this example << If my document field is ( ab,bc,cd,ef) and Search tokens are
(ab,bc,cd).>> I expect a % match of around 90%.
> However I get a match of only around 50% without a boost. Using a boost infact reduces
my percentage.
> I even overrode the queryNorm method to return a one, still the percentage did not increase.
Is there any way of implementing this using the current set of
implementation classes in Lucene and not making complex changes to the
structure by itself.
( which is what i gather has to be done from the previous replies)

Can anyone suggest an alternative way of implementing this requirement
using the existing bunch of classes in Lucene and not necessarily
using the ones I have used*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message