lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Sun, 17 Jun 2007 06:16:47 GMT

I've really only had a chnce to skim this thread so far, but if i
understand correctly, the goal is to get documents back in a "blended"
order based on:
  1) textual relevancy to the search input
  2) recentness
  3) a mapping of field values to arbitrary numeric weights which need to
     be specified at query time (ie: score collection:A better then
     collection:C better then collectoin:Q etc...)

In that case i think a "function query" is the way to go ...  I haven't
relaly had a chance to catch up on the way the Solr FunctionQuery class
morphed when it was adopted into the Lucene core, but i believe all the
relevent pieces are in the package, and
it seems to have some good package level javadocs...

You seemed to be on the right track asking about ValueSourceQuery ... but
thta's only part of hte puzzle: for the "recentness" aspect a
ValueSourceQuery composed on a ReverseOrdFieldSource should take care of
things ... but the arbitrary weighting by "collection" will really require
you to provide your own ValueSource implementation -- most likely you'll
want to leverage the FieldCache, but map your "collectionIds" (whatever
they are) to the numeric values you want to use.

then you'll have all the pieces, the only thing left to do will be to
decide if you want to combine them with a regular BooleanQuery or use a

As for your comments about "random scoring" ... this is really, Really,
REALLY hard to get "right" for a variety of reasons that i don't really
want to go into right now ... my advice: don't attempt to commit to
"random" ordering.   Instead commit to promoting N randomly selected
documents to the front of the results ... this is easy to do by writting a
custom query (again ValueSourceQuery can probably help you) where you
pick N random numbers between 0 and maxDoc and score them really high ...
then let the rest of the docs score as they normally would.

In a paginated application, if N is 3 or 4 times the number of results you
show per page, your results will look pretty damn random considering how
few people drill down past page#2.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message