lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: two fields, the first important than the second
Date Fri, 27 Apr 2012 03:17:46 GMT
sorry for some typos.
original query +(title:hello desc:hello) +(title:world desc:world)
boosted one   +(title:hello^2 desc:hello) +(title:world^2 desc:world)
last one     +(title:hello desc:hello) +(title:world desc:hello)
   (+title:hello +title:world)^10 (+desc:hello +desc:world)^5

the example has two terms. if it has more terms, the query will become too
complicated.

On Fri, Apr 27, 2012 at 11:12 AM, Li Li <fancyerii@gmail.com> wrote:

> you should describe your ranking strategy more precisely.
> if the query has 2 terms, "hello" and  "world" for example, and your
> search fields are title and description.  There are many possible
> combinations.
> Here is my understanding.
> Both terms should occur in title or desc
>     query may be +(title:hello desc:hello) +(title:world desc:hello)
>     the problem is that we need title weight more than desc, so may be we
> rewrite it to
>    +(title:hello^2 desc:hello) +(title:world^2 desc:hello)
>     but we consider this two scenarios:
>     1. hello hit only in title, world hit only in desc
>     2. hello and world both hit in desc
>     because title is boosted, so 1 has more score than 2.
>     But we may think 2 is better than 1 because hello world is a phrase.
> But we don't want to use phrase query because it's too strict that the
> recall can meet our needs.
>    Our solution is modify lucene so boolean scorer can tell us which term
> is matched. then we use our own collector to boost scenario 1. This
> solution need modify lucene(I have posted a mail and you can patch your
> DisjunctionSumScorer with
> https://issues.apache.org/jira/browse/LUCENE-2686)
>    Another solution I can come up with is using complicated query:
>    +(title:hello desc:hello) +(title:world desc:hello)
>    (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>    The must occurrence condition is the same as before. but if hello world
> are all in title, we give it a boost. similarly, if hello world are all in
> desc, we also boost it.
>
>
>
> On Fri, Apr 27, 2012 at 3:12 AM, Akos Tajti <akos.tajti@gmail.com> wrote:
>
>> Dear List,
>>
>> we've been struggling the following problem for a while:
>> we have two fields: title and description. Title is generated from short
>> summaries while description is generated fromlong texts. We want to search
>> on both fields at the same time but we'd like to get all documents in
>> which
>> the title matches the search term before all others. For multi term
>> queries
>> we want to achieve the following: all documents that contain all terms in
>> their title must come before every other document, no matter how many
>> times
>> the description matches the query. Is there a simple way to achieve this?
>>
>> Thanks in advance,
>> Ákos Tajti
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message