lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: two fields, the first important than the second
Date Fri, 27 Apr 2012 07:40:06 GMT
+(title:hello title:world desc:hello desc:world)
(+title:hello +title:world)^100
(+desc:hello +desc:world)^50
(+title:hello +desc:world)^10
(+desc:hello +title:world)^10

the boost values(100,50,10,10) should be carefully adjusted.
if tf of a document is very large, 10 may be not enough.
you can modify DefaultSimilariy of it's methods such as tf() idf() and
constrain them to a controllable range.

On Fri, Apr 27, 2012 at 2:59 PM, Akos Tajti <akos.tajti@gmail.com> wrote:
> Thanks gfor the details explanation. But as I understand this query will
> still match only documents that contains both terms (either in the same
> field or in different). What if there's a document that contains only
> "hello"? This query will not find it, am I right? But what we want to
> achieve is this. So in the result first have to come those documents that
> contain both terms then thos that contain only one of them.
>
> Ákos
>
>
>
> On Fri, Apr 27, 2012 at 5:17 AM, Li Li <fancyerii@gmail.com> wrote:
>
>> sorry for some typos.
>> original query +(title:hello desc:hello) +(title:world desc:world)
>> boosted one   +(title:hello^2 desc:hello) +(title:world^2 desc:world)
>> last one     +(title:hello desc:hello) +(title:world desc:hello)
>>    (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>>
>> the example has two terms. if it has more terms, the query will become too
>> complicated.
>>
>> On Fri, Apr 27, 2012 at 11:12 AM, Li Li <fancyerii@gmail.com> wrote:
>>
>> > you should describe your ranking strategy more precisely.
>> > if the query has 2 terms, "hello" and  "world" for example, and your
>> > search fields are title and description.  There are many possible
>> > combinations.
>> > Here is my understanding.
>> > Both terms should occur in title or desc
>> >     query may be +(title:hello desc:hello) +(title:world desc:hello)
>> >     the problem is that we need title weight more than desc, so may be we
>> > rewrite it to
>> >    +(title:hello^2 desc:hello) +(title:world^2 desc:hello)
>> >     but we consider this two scenarios:
>> >     1. hello hit only in title, world hit only in desc
>> >     2. hello and world both hit in desc
>> >     because title is boosted, so 1 has more score than 2.
>> >     But we may think 2 is better than 1 because hello world is a phrase.
>> > But we don't want to use phrase query because it's too strict that the
>> > recall can meet our needs.
>> >    Our solution is modify lucene so boolean scorer can tell us which term
>> > is matched. then we use our own collector to boost scenario 1. This
>> > solution need modify lucene(I have posted a mail and you can patch your
>> > DisjunctionSumScorer with
>> > https://issues.apache.org/jira/browse/LUCENE-2686)
>> >    Another solution I can come up with is using complicated query:
>> >    +(title:hello desc:hello) +(title:world desc:hello)
>> >    (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>> >    The must occurrence condition is the same as before. but if hello
>> world
>> > are all in title, we give it a boost. similarly, if hello world are all
>> in
>> > desc, we also boost it.
>> >
>> >
>> >
>> > On Fri, Apr 27, 2012 at 3:12 AM, Akos Tajti <akos.tajti@gmail.com>
>> wrote:
>> >
>> >> Dear List,
>> >>
>> >> we've been struggling the following problem for a while:
>> >> we have two fields: title and description. Title is generated from short
>> >> summaries while description is generated fromlong texts. We want to
>> search
>> >> on both fields at the same time but we'd like to get all documents in
>> >> which
>> >> the title matches the search term before all others. For multi term
>> >> queries
>> >> we want to achieve the following: all documents that contain all terms
>> in
>> >> their title must come before every other document, no matter how many
>> >> times
>> >> the description matches the query. Is there a simple way to achieve
>> this?
>> >>
>> >> Thanks in advance,
>> >> Ákos Tajti
>> >>
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message