lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Volkman <jvolk...@gmail.com>
Subject Re: Base score to use for custom query?
Date Tue, 27 Apr 2010 22:42:39 GMT
Hi Hoss,

I didn't end up writing my own query (well I did, but all it does is rewrite
into another query). I found DisjunctionMaxQuery, which seemed a good fit
for what I was trying to do. Instead of TermQuery, I used ConstantScoreQuery
combined with TermsFilter to create queries that weren't dependent upon the
Term's scores. For each ConstantScoreQuery, I set the boost much as you
suggested.

What's the difference in this case between using a DisjunctionMaxQuery,
which is what I've done, and using a BooleanQuery with disabled coord? And,
if I set omit norms, will TermQuery essential return constant scores for
terms? Does the use of DMQ + CSQ + TermsFilter throw up any red flags in
your experience?

Thanks again,
Jeremy

On Tue, Apr 27, 2010 at 2:14 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> First off: if you haven't already make sure you OMIT_NORMS when indexing
> this field, that way you don't have to worry about docs with "lots" of
> numbers scoring low purely because of hte fieldNorm.
>
> Second: i wouldn't bother with a custom query, i would stick with your
> BooleanQuery appraoch, but make sure you do two things:
>
> 1) add boosts to all of your TermQueries a boost based on how far they are
> from the end of hte range. so if you have a rangle like [10 19] give the
> 19 clauses a boost of 1, the 18 clause a boost of 2, the 17 clause a boost
> of 3, etc...
>
> 2) disable the coord.  there is an option on BooleanQuery to do this, and
> it will make sure docs that only match one clause in your BooleanQuery
> dont' get a penalty compared to clauses that match many clauses in your
> BooleanQuery -- which is going to be important in ensuring that your
> boosts are useful.
>
> That should get you what you want, and if not then take a look at the
> score explaiantions and see if anything obvious jumps out -- post a
> followup with your code and the score explanations if you can't solve it
> to your liking.
>
> : I have a field in my document that contains a range of numbers. Say, for
> : example, the universe of numbers is the range of integers from 0-100. My
> : field represents a subrange of those numbers in a token stream. So, for
> : example, if one document contains 20-30, it's token stream contains the
> : terms [20, 21, 22, ..., 29]. Now I can quickly find all documents that
> : contain some number.
> :
> : The next part of the problem is searching for all documents that
> intersect
> : with some subrange of numbers. Somewhat like a range query, but not
> exactly.
> : Say I want to search for all documents that touch the range [10, 30]. My
> : original implementation was to simply create a BooleanQuery full of
> : TermQuerys for each term in the range i was searching for. While this
> : returned the proper results, it did so with skewed scores. I'd prefer
> : documents containing numbers towards the beginning of my search range to
> be
> : scored higher than docs towards the end. So, if I had two documents, one
> : with 10-20, and one with 20-30, and I searched for [19,30], both
> documents
> : would be returned, but the second would be much more highly scored due to
> : its higher number of matched terms.
> :
> : So, my plan is to write a custom query which matches documents documents
> in
> : my range in a way such as:
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message