lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: two fields, the first important than the second
Date Fri, 27 Apr 2012 03:12:26 GMT
you should describe your ranking strategy more precisely.
if the query has 2 terms, "hello" and  "world" for example, and your search
fields are title and description.  There are many possible combinations.
Here is my understanding.
Both terms should occur in title or desc
    query may be +(title:hello desc:hello) +(title:world desc:hello)
    the problem is that we need title weight more than desc, so may be we
rewrite it to
   +(title:hello^2 desc:hello) +(title:world^2 desc:hello)
    but we consider this two scenarios:
    1. hello hit only in title, world hit only in desc
    2. hello and world both hit in desc
    because title is boosted, so 1 has more score than 2.
    But we may think 2 is better than 1 because hello world is a phrase.
But we don't want to use phrase query because it's too strict that the
recall can meet our needs.
   Our solution is modify lucene so boolean scorer can tell us which term
is matched. then we use our own collector to boost scenario 1. This
solution need modify lucene(I have posted a mail and you can patch your
DisjunctionSumScorer with https://issues.apache.org/jira/browse/LUCENE-2686)
   Another solution I can come up with is using complicated query:
   +(title:hello desc:hello) +(title:world desc:hello)
   (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
   The must occurrence condition is the same as before. but if hello world
are all in title, we give it a boost. similarly, if hello world are all in
desc, we also boost it.


On Fri, Apr 27, 2012 at 3:12 AM, Akos Tajti <akos.tajti@gmail.com> wrote:

> Dear List,
>
> we've been struggling the following problem for a while:
> we have two fields: title and description. Title is generated from short
> summaries while description is generated fromlong texts. We want to search
> on both fields at the same time but we'd like to get all documents in which
> the title matches the search term before all others. For multi term queries
> we want to achieve the following: all documents that contain all terms in
> their title must come before every other document, no matter how many times
> the description matches the query. Is there a simple way to achieve this?
>
> Thanks in advance,
> Ákos Tajti
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message