lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Question: using boost for sorting
Date Thu, 17 Oct 2002 05:19:07 GMT
It just occurred to me that this diff is really pretty useless.
The methods I added don't do anything by themselves...
I just added new methods to those 3 classes, but I don't see where
IndexWriter and Searcher use Similarity, and Similarity currently
doesn't use the instance that was set by setDefaultSimilarity.

And Similarity's public methods are static.  In order for the new
Similarity instance to be used (the one specified in
setDefaultSimilarity(Similarity)) we would/could make Similarity a
singleton, make method non-static, add
Similarity.getDefaultSimilarity() method, and then replace calls like
this:

idf = Similarity.idf(term, searcher);

with

idf = Similarity.getDefaultSimilarity().idf(term, searcher);


Is this what you had in mind, Doug?

Otis


--- Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> Here are the diffs for:
>   Similarity.java
>   IndexWriter.java
>   Searcher.java
> 
> The changes were minimal, everything should still work the same way
> as
> before.  Similarity's public methods are all static, so making this
> class abstract makes no difference to the outside callers of its
> public
> methods.
> 
> Otis
> 
> 
> --- Doug Cutting <cutting@lucene.com> wrote:
> > Please submit diffs before committing anything, as this is delicate
> 
> > code.  Small changes here can affect performance in a big way.
> > 
> > Also, we must be extra-careful when making a new public API: once a
> 
> > method is public it's very hard to remove it.  The Similarity
> methods
> > 
> > also need to be well documented.
> > 
> > Doug
> > 
> > Otis Gospodnetic wrote:
> > > This sounds good to me, as it would lead us to pluggable
> similarity
> > > computation...mmmm.
> > > I can refactor some of this tonight.
> > > 
> > > Otis
> > > 
> > > 
> > > --- Doug Cutting <cutting@lucene.com> wrote:
> > > 
> > >>This looks like a good approach.  When I get a chance, I'd like
> to
> > >>make 
> > >>Similarity an interface or an abstract class, whose default 
> > >>implementation would do what the current class does, but whose
> > >>methods 
> > >>can be overridden.  Then I'd add methods like:
> > >>
> > >>   public static void Similarity.setDefaultSimilarity(Similarity
> > >>sim);
> > >>   public void IndexWriter.setSimilarity(Similarity sim);
> > >>   public void Searcher.setSimilarity(Similarity sim);
> > >>
> > >>So to override Similarity methods you'd define a subclass of the 
> > >>standard implementation, then either install yours globally via 
> > >>setDefaultSimilarity, or set it in your IndexWriter before adding
> 
> > >>documents and in your Searcher before searching.  Does that sound
> 
> > >>reasonable?
> > >>
> > >>This would let you do what you describe below without changing
> > >>Lucene's 
> > >>sources.  However I'm very short on time right now and don't know
> > how
> > >>
> > >>soon I'll get to this.
> > >>
> > >>Doug
> > >>
> > >>David Birtwell wrote:
> > >>
> > >>>Hi Dmitry,
> > >>>
> > >>>I was faced with a similar problem.  We wanted to have a numeric
> > >>
> > >>rank 
> > >>
> > >>>field in each document influence the order in which the
> documents
> > >>
> > >>were 
> > >>
> > >>>returned by lucene.  While investigating a solution for this, I
> > >>
> > >>wanted 
> > >>
> > >>>to see if I could implement strict sorting based on this numeric
> > >>
> > >>value. 
> > >>
> > >>>I was able to accomplish this using document boosting, but not
> > >>
> > >>without 
> > >>
> > >>>modifying the lucene source.  Our "ranking" field is an integer
> > >>
> > >>value 
> > >>
> > >>>from one to one hundred.  I'm not sure if this will help you,
> but
> > >>
> > >>I'll 
> > >>
> > >>>include a summary of what I did.
> > >>>
> > >>>In DocumentWriter remove the normalization by field length:
> > >>>   float norm = fieldBoosts[n] * 
> > >>>Similarity.normalizeLength(fieldLengths[n]);
> > >>>to
> > >>>   float norm = fieldBoosts[n];
> > >>>
> > >>>In TermScorer and PhraseScorer, modify the score() method to
> > ignore
> > >>
> > >>the 
> > >>
> > >>>lucene base score:
> > >>>   score *= Similarity.decodeNorm(norms[d]);
> > >>>to
> > >>>   score = Similarity.decodeNorm(norms[d]);
> > >>>
> > >>>In Similarity.java, make byteToFloat() public.
> > >>>
> > >>>At index time, use Similarity.byteToFloat() to determine your
> > boost
> > >>
> > >>>value as in the following pseudocode:
> > >>>   Document d = new Document();
> > >>>   ... add your fields ...
> > >>>   int rank = d.getField("RANK"); (range of rank can be 0 to
> 255)
> > >>>   float sortVal = Similarity.byteToFloat(rank)
> > >>>   d.setBoost(sortVal)
> > >>>
> > >>>If you'd like the reasoning behind any or all of these items,
> let
> > >>
> > >>me know.
> > >>
> > >>>DaveB
> > >>>
> > >>>
> > >>>
> > >>>Dmitry Serebrennikov wrote:
> > >>>
> > >>>
> > >>>>Greetings Everyone,
> > >>>>
> > >>>>I'm thinking of trying to build something that manipulates a
> > query
> > >>>
> > >>>>score in order to achieve a sort order other then the default 
> > >>>>relevance sort. The idea is to create a new type of query:
> > >>>>SortingQuery( Query query, String sortByField )
> > >>>>
> > >>>>It would run the sub-query and return results in an order of
> the 
> > >>>>values found in the "sortByField" for those documents. Now,
> I've 
> > >>>>looked at all of the sorting discussion prior to this, and the
> > >>>
> > >>best 
> > >>
> > >>>>approach (recommended by Doug among others) is to provide some
> > >>>
> > >>sort of 
> > >>
> > >>>>a fast access to the field values inside the HitCollector.
> > Reading
> > >>>
> > >>>>documents at search time is too slow, so people access the data
> 
> > >>>>elsewhere or build an in-memory index of that data (such as is
> > >>>
> > >>done in 
> > >>
> > >>>>the SearchBean's SortField).
> > >>>>
> > >>>>My idea is different. I want to try to do the following:
> > >>>>- compose a query that consists of the original sub-query
> > followed
> > >>>
> > >>by 
> > >>
> > >>>>a special "sorting query"
> > >>>>- "boost" the score of the original sub-query to 0
> > >>>>- compute the score of the sorting query such that it would
> > >>>
> > >>reflect 
> > >>
> > >>>>the desired sort order
> > >>>>
> > >>>>Has anyone tried to do something like this?
> > >>>>Would this work?
> > >>>>Is this worth doing?
> > >>>>If it would, would then I have to do something during the
> > indexing
> > >>>
> > >>>>time to set normalization / scoring factors for that field to 
> > >>>>something or other?
> > >>>>
> > >>>>Thanks.
> > >>>>Dmitry.
> > >>>>
> > >>>>
> > >>>>
> > >>>>-- 
> > >>>>To unsubscribe, e-mail:   
> > >>>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > >>>>For additional commands, e-mail: 
> > >>>><mailto:lucene-user-help@jakarta.apache.org>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>-- 
> > >>>To unsubscribe, e-mail:   
> > >>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > >>>For additional commands, e-mail: 
> > >>><mailto:lucene-user-help@jakarta.apache.org>
> > >>>
> > >>
> > >>
> > >>--
> > >>To unsubscribe, e-mail:  
> > >><mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > >>For additional commands, e-mail:
> > >><mailto:lucene-user-help@jakarta.apache.org>
> > >>
> > > 
> > > 
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Faith Hill - Exclusive Performances, Videos & More
> > > http://faith.yahoo.com
> > > 
> > > --
> > > To unsubscribe, e-mail:  
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> > > 
> > 
> > 
> > 
> > --
> > To unsubscribe, e-mail:  
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > 
> 
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Faith Hill - Exclusive Performances, Videos & More
> http://faith.yahoo.com

> ATTACHMENT part 2 application/octet-stream name=Similarity.diff


> ATTACHMENT part 3 application/octet-stream name=IndexWriter.diff


> ATTACHMENT part 4 application/octet-stream name=Searcher.diff
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


__________________________________________________
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More
http://faith.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message