lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: document diversity
Date Tue, 06 Oct 2009 21:50:40 GMT
this sounds like a pretty good usecase for CustomScoreQuery
The package provides flexible
programmatic control over document scores. You boost up documents with
greatest revenue or throw in some factors based on potential revenue
even if the revenue "value" for a document has changed since it was

Another way would be to only update the norm value of a document
periodically for documents with changing revenue. This prevents you
from re-indexing. See IndexReader#setNorm (,%20java.lang.String,%20byte)


On Tue, Oct 6, 2009 at 10:33 PM, Michael Masters <> wrote:
> My initial description may have been a little abstract. Maybe I should
> explain exactly what I'm trying to do. My company has various revenue
> channels, one of which is per click. If a user does a search, we would
> like to show results with the greatest revenue, although we don't want
> people to be able to buy all the top results. Hence, we would like to
> have some way of mixing results. The mixing of results could be based
> of potential revenue, relevancy, which revenue stream the result is
> associated with, etc.
> The previously mentioned ideas are great btw.
> -Mike
> On Sat, Oct 3, 2009 at 4:25 PM, Grant Ingersoll <> wrote:
>> I'm curious, can you elaborate more on the deeper use case for this?
>> Perhaps just implementing faceting on doc type would be sufficient?  That
>> way users can drill in on doc type.  Alternatively, I suppose you could
>> implement a hit collector that accesses a field cache on the doc type field
>> and promotes lesser seen doc types until they are evenly represented.  Could
>> also likely write a Function query that does a similar thing.  I'd imagine
>> you need to be careful to control your memory.
>> -Grant
>> On Oct 1, 2009, at 12:56 PM, Michael Masters wrote:
>>> I was wondering if there is any way to control what kind of documents
>>> are returned from a search. For example, lets say we have an index
>>> built from different types of documents (pdf, txt, html, etc.). Is
>>> there a way to have the first x results have a specified distribution
>>> of document types? It would be nice to have an even number of results
>>> that are from pdfs, txt files, and html files.
>>> Any help would greatly be appreciated.
>>> -Mike
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> --------------------------
>> Grant Ingersoll
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message