lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Baudoux ...@taktik.be>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Sun, 17 Jun 2007 15:37:09 GMT
Hi chris,

>
> I've really only had a chnce to skim this thread so far, but if i
> understand correctly, the goal is to get documents back in a "blended"
> order based on:
>   1) textual relevancy to the search input
>   2) recentness
>   3) a mapping of field values to arbitrary numeric weights which  
> need to
>      be specified at query time (ie: score collection:A better then
>      collection:C better then collectoin:Q etc...)
>

You have perfectly understood my question, thanks for trying to help!

> In that case i think a "function query" is the way to go ...  I  
> haven't
> relaly had a chance to catch up on the way the Solr FunctionQuery  
> class
> morphed when it was adopted into the Lucene core, but i believe all  
> the
> relevent pieces are in the org.apache.lucene.search.function  
> package, and
> it seems to have some good package level javadocs...
>
Thats what i discovered. The question is : Is the ValueSourceQuery  
strong and fast enough to be
used confidently in a production environment? I looked at the source  
code and it seem spretty straightforward,
so I would say yes, as long as i use the caches correctly. Can you  
confirm?



> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ 
> javadoc/org/apache/lucene/search/function/package-summary.html
>
> You seemed to be on the right track asking about  
> ValueSourceQuery ... but
> thta's only part of hte puzzle: for the "recentness" aspect a
> ValueSourceQuery composed on a ReverseOrdFieldSource should take  
> care of
> things ... but the arbitrary weighting by "collection" will really  
> require
> you to provide your own ValueSource implementation -- most likely  
> you'll
> want to leverage the FieldCache, but map your  
> "collectionIds" (whatever
> they are) to the numeric values you want to use.
>
> then you'll have all the pieces, the only thing left to do will be to
> decide if you want to combine them with a regular BooleanQuery or  
> use a
> CustomScoreQuery.
>
Yes, I will have to implement my own ValueSource, but it seems  
it'really not complicated, looking at the existing
ValueSource implementations.

>
> As for your comments about "random scoring" ... this is really,  
> Really,
> REALLY hard to get "right" for a variety of reasons that i don't  
> really
> want to go into right now ... my advice: don't attempt to commit to
> "random" ordering.   Instead commit to promoting N randomly selected
> documents to the front of the results ... this is easy to do by  
> writting a
> custom query (again ValueSourceQuery can probably help you) where you
> pick N random numbers between 0 and maxDoc and score them really  
> high ...
> then let the rest of the docs score as they normally would.
>
What's wrong with this idea :
Each day i generate an shuffle a vector of Maxdoc integers from 0 to  
Maxdoc.

Then i use a valueSource query with a valueSource that uses this  
vector to randomly score the documents.
Of course I have to somehow normalize those random scores so that  
their "contribution factor" remains constant when MaxDocs increases.


Thanks for your advices !


Antoine

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message