lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Query time document group boosting
Date Thu, 27 Nov 2008 06:30:36 GMT
The most scary part is that that you will have to score each and every  
document that has a source, probably all of the documents in your  
corpus. So if you have a very large number of documents it might be a  
bit expensive. Also, appending this query for boost only means that  
you will get hits on documents that has nothing to do with the user  
query.

I think you are looking for CustomScoreQuery.


     karl

26 nov 2008 kl. 16.54 skrev Toke Eskildsen:

> We use Lucene at our library for indexing from different sources into
> the same logical index. The sources are very diverse and are  
> prioritized
> differently at index-time with document boosts. However, different
> groups of users (or individual users for that matter) have different
> preferences for the relevancy of the sources, which clashes with
> index-time boosting. Query-time tweaking would be preferable.
>
>
> My coworker Mikkel Kamstrup Erlandsen had this bright and slightly  
> scary
> idea...
>
> Suppose we have sources A-Z. For each document from the sources, we  
> add
> the term groupboost_<source>:dummy. All documents from source A has
> groupboost_A:dummy, all documents from source B has groupboost_B:dummy
> and so on.
>
> Now, whenever the user enters a query, we parse it the normal way and
> wrap it in a BooleanQuery where we add our groupterm_<source>:dummy as
> TermQueries with boosts specified by the user (or more realistically
> under the hood by the front-end for the user).
>
> Example: Let's say we have a user that love all things from source A  
> and
> hates the ones from source C. The front-end knows this. The user  
> enters
> the query "foo" which expands to
> "foo OR groupboost_A:dummy^10 OR groupboost_C:dummy^0.1"
> The result should be that there's a high probability that the first  
> hits
> will come from source A, unless there are significantly better matches
> from other groups. Likewise hits from group C will probably be near  
> the
> end of the list of hits.
>
> Presto! The user gets what he wants, practically no search-time  
> penalty,
> simple. One obvious limitation is that we don't want too many groups  
> aka
> sources for this, but in reality we're talking 10-30 groups, so I  
> don't
> see that as a problem.
>
> So what's the scary part? I don't know, I just have a feeling that  
> Here
> Be Dragons. It seems that it should work without messing with ranking,
> besides the specific boost of course, as all documents match exactly  
> one
> "groupboost_<source>:dummy"-query, but I would like to hear the  
> opinion
> of more seasoned Lucene users. Is it a sensible way to approach the
> problem?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message