lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: PrefixQuery Scoring
Date Wed, 13 Feb 2002 23:21:03 GMT
> From: Jonathan Franzone [mailto:jonathan@franzone.com]
> 
> Whenever I add a PrefixQuery to my search the scoring gets 
> really small. For
> example if I do a query like this: +java then the scoring 
> starts around
> 0.866... and so forth. But if I do a query like this: +java* then the
> scoring start like 0.00034... Is there a specific reason for 
> this?

A PrefixQuery is equivalent to a query containing all the terms matching the
prefix, and is hence usually contains a lot of terms.  With such a big
query, matching documents are likely to contain fewer of the query terms and
the match is thus weaker.  For example, the top scoring document in a prefix
query might contain only one or two of 100 or more query terms.  That's not
a very strong match.  But the top-scoring document in a single term
non-prefix query is guaranteed to contain all of the query terms, and is
thus a much stronger match.

There are of course other factors involved in scoring (e.g., document length
& term frequency).  I call the factor in question here "coordination"
matching.  Documents which contain more of the query terms score higher.
This is to make the top hits of boolean "OR" queries look like those of a
boolean "AND" of the same terms, with the "OR" results following.

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message