lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Changing the Scoring api for OR parameters
Date Fri, 08 Sep 2006 18:08:41 GMT

if you are already seting the document boost based on the "date" of hte
Document, then the next thing you should familiarize yourself with is
Similarity.coord function.  It's specific purpose is for dealing with
Queries which "aggregate" other queries (like a BooleanQuery does with
it's clauses) to let you determine how much "penalty" documents recieve
for only matching on a subset of the clauses.

(the conventional wisdom being that a document matching more clauses is
"better" then a document matching few clauses, even if the aggregate
score of second document is a little higher thn hte first)

If you don't like that conventional wisdon, you can make the coord method
of your Similarity return a constant value (like "1") to illiminate it's
impact completely, or define some function that has a lower impact then
the overlap/maxOverlap algorithm used in the DefaultSimilarity.

Since illiminating the impact completely is a common need among
"artifically" constructed queries (like Prefix queries and Wild card
ueries) there is acctually a constructor for BooleanQuery that takes in a
boolean to do this.



Lastly: if you want more exact control over the way the dates of your
documents influence the score (without mucking with norms to make them
floats instead of bytes) consider using FunctionQuery ... searching the
archives will pop up several examples of how it can be used, and you can
find it in the Solr code base...

http://incubator.apache.org/solr/
http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/package-summary.html


: Date: Fri, 8 Sep 2006 09:45:13 +0200
: From: Marcus Falck <marcus.falck@observer.se>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Changing the Scoring api for OR parameters
:
: Hi everyone,
:
:
:
: I want to override the default scoring when it comes to queries
: containing the OR operator.
:
:
:
: For example if I got the following headlines in my index :
:
: "Sun sues Microsoft"
:
: "Microsoft want to buy Tiscali"
:
: ".NU domain sues Microsoft"
:
: "The sun is shining"
:
: "Sun brings antitrust suit against Microsoft"
:
:
:
: Those documents have been boosted in desc fashion ("Sun sues Microsoft"
: has higher calculated norm value then "Sun brings antirust suit against
: Microsoft"),
:
: The similarity class that has been used has made the norm values to be
: exactly as the boost value ( I have even modified the norm to be a float
: so I won't loose precision ).
:
:
:
: If I perform a search for: Microsoft OR Sun
:
:
:
: The topranked results will almost certainly be:
:
: Sun sues Microsoft
:
: Sun Brings antitrust suit against Microsoft
:
: ....
:
:
:
: I just want the documents returned like this:
:
: "Sun sues Microsoft"
:
: "Microsoft want to buy Tiscali"
:
: ".NU domain sues Microsoft"
:
: "The sun is shining"
:
: "Sun brings antitrust suit against Microsoft"
:
:
:
: I have to get this to work since I'm indexing news material and the
: customers are only interested in the newest articles ( so the date of
: the article is being used as a boost factor).
:
:
:
: Any ideas? My rank changes to lucene works as expected when it comes to
: AND operator and single term queries.
:
:
:
: /
:
: Regards
:
:
:
: Marcus Falck
:
:
:
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message