lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Brown <...@intelcompute.com>
Subject Re: negative boosts for docs with common field value
Date Tue, 11 Oct 2011 22:56:07 GMT
The setup for this question was to simplify the actual environment,
we're not actually demoting popular authors.

Perhaps index-time (negative) boosts are indeed the only way.


-- 

IntelCompute
Web Design and Online Marketing

http://www.intelcompute.com


-----Original Message-----
From: Chris Hostetter <hossman_lucene@fucit.org>
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: negative boosts for docs with common field value
Date: Tue, 11 Oct 2011 15:37:03 -0700 (PDT)

: Some searches will obviously be saturated by docs from any given author if
: they've simply written more.
: 
: I'd like to give a negative boost to these matches, there-by making sure that
: 1 Author doesn't saturate the results just because they've written 500
: documents, compared to others who may have only written 2-3 documents.
: 
: The actual author value doesn't matter, I just want to bring down the score of
: docs by any common author to give more varied results.
: 
: What's the easiest approach for this, and is it even possible at query time?
: I could do this at index time but would prefer a Solr solution.

w/o a custom plugin, the only way i know of to do something like this 
would be to index a numeric "author_prolificness" field in each doc and 
use that as the basis of a function query.

but honestly: i *really* don't think you want to do this - not if you are 
dealing with real user queries (maybe if this is for some syntheticly 
generated "related documents" or "interesting documents" query)

Imagine a user is searching for a *very* specific title (ie: "Nightfall") 
by a very prolific author ("Isaac Asimov).  What your'e describing would 
penalize the desired match just because the author is prolific -- even if 
the user types in the exact title of a document, so that some much more 
esoteric document with the same title by an author who has written nothing 
else ("Stephen Leather") would likely score higher.


I mean: if someone types in "Romeo and Juliet" do you really want to score 
documents by "Shakespeare" lower then documents by "Stanley W. Wells" just 
because Wells has written fewer total books?



-Hoss


Mime
View raw message