lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Multiple Query clauses impacting result
Date Wed, 03 Aug 2011 23:58:29 GMT

: So in a business scenario where we have to make a decision based on the
: "accepted" matching of a document (say perform activity A only when a
: document matches more than 50%), we wont be able to rely on the match score
: because the score will change based on our query and some times 80% matching
: may not be as close as 5% matching with a slightly different query. (I know
: I am going back to  % again :)
: 
: So how do we handle such a scenario?

you have to redefine your criteria.  "50% match" is meaninless -- you have 
to decide what that means: does it mean matching half of the clauses in a 
boolean query?  what if a doc matches only 1/3 of the clauses, but it 
matches them 100 times each? what if it matches 1/2 the clauses, 100 times 
each, but that only makes up a tiny fraction of the total terms in thta 
document (ie: it's got the entire contents of wikipedia in every field)?  
what if the query isn't a boolean query but a phrase query?

if you have a constrained set of possible queries, and you can define 
precisesly what rules you care about, you can modify your similarity class 
such that regardless of the index to produces scores that you *can* use to 
make inferences about given your rules.

See Also...
	http://www.gossamer-threads.com/lists/lucene/java-user/61075
	http://markmail.org/thread/3svvskbay4hpqyms
	http://markmail.org/message/lztdm4xosmceup5t
And a real oldy but goodie...
	http://markmail.org/message/5eipstcu6lky2h2j


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message