lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Proximity and Percentage match search in Lucene
Date Tue, 28 Apr 2009 22:43:43 GMT

Radha: replying/reforwarding the same message over and over doesn't tend 
to be a useful way to encourage additional replies.  if you do have 
something to add to an existing discussion that you've started, you should 
at least do it as a reply to the orriginal discussion so people have the 
full context...

I'm really not sure that anyone has any new additional info to offer you.  
this is a particularly hard problem, that doesn't have a very efficient 
solution that i know of.  it sounds like you've already tried the obvious 
solution, but aren't happy with the scores produced -- if you can't get 
the scores you want by tweaking the Similarity options available, then 
implementing custom Query/Scorer classes is really the only remaining 

: Date: Sun, 19 Apr 2009 17:52:27 +0530
: From: Radha Sreedharan <>
: Reply-To:
: To:
: Subject: Proximity and Percentage match search in Lucene
:  What I need is  the following :
: If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd).
:  Given the following :
:  I should get a hit even if all of the search tokens aren't present
:  If the tokens are found they should be found within a distance x of
: each other ( proximity
: search)
: >
: > I need the percentage match of the search tokens with the document field.
: >
: > Currently this is my query :
: > 1) I form all possible permutation of the search tokens
: > 2) do a spanNearQuery of each permutation
: > 3)  Do a DisjunctionMaxQuery on the spannearqueries.
: >
: > This is how I compute % match  :
: > % match =  ( Score by running the query on the document field ) /
: > 		( score by running the query on a document field created out of search tokens )
: >
: > The numerator gives me the actual score with the search tokens run on the field.
: > Denominator gives  me the  best possible or maximum possible score with the current
: tokens
: >
: > For this example << If my document field is ( ab,bc,cd,ef) and Search tokens
: (ab,bc,cd).>> I expect a % match of around 90%.
: >
: > However I get a match of only around 50% without a boost. Using a boost infact reduces
: my percentage.
: >
: > I even overrode the queryNorm method to return a one, still the percentage did not
: *
: Is there any way of implementing this using the current set of
: implementation classes in Lucene and not making complex changes to the
: structure by itself.
: ( which is what i gather has to be done from the previous replies)
: Can anyone suggest an alternative way of implementing this requirement
: using the existing bunch of classes in Lucene and not necessarily
: using the ones I have used*


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message