lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akos Tajti <akos.ta...@gmail.com>
Subject Re: two fields, the first important than the second
Date Fri, 27 Apr 2012 06:59:58 GMT
Thanks gfor the details explanation. But as I understand this query will
still match only documents that contains both terms (either in the same
field or in different). What if there's a document that contains only
"hello"? This query will not find it, am I right? But what we want to
achieve is this. So in the result first have to come those documents that
contain both terms then thos that contain only one of them.

Ákos



On Fri, Apr 27, 2012 at 5:17 AM, Li Li <fancyerii@gmail.com> wrote:

> sorry for some typos.
> original query +(title:hello desc:hello) +(title:world desc:world)
> boosted one   +(title:hello^2 desc:hello) +(title:world^2 desc:world)
> last one     +(title:hello desc:hello) +(title:world desc:hello)
>    (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>
> the example has two terms. if it has more terms, the query will become too
> complicated.
>
> On Fri, Apr 27, 2012 at 11:12 AM, Li Li <fancyerii@gmail.com> wrote:
>
> > you should describe your ranking strategy more precisely.
> > if the query has 2 terms, "hello" and  "world" for example, and your
> > search fields are title and description.  There are many possible
> > combinations.
> > Here is my understanding.
> > Both terms should occur in title or desc
> >     query may be +(title:hello desc:hello) +(title:world desc:hello)
> >     the problem is that we need title weight more than desc, so may be we
> > rewrite it to
> >    +(title:hello^2 desc:hello) +(title:world^2 desc:hello)
> >     but we consider this two scenarios:
> >     1. hello hit only in title, world hit only in desc
> >     2. hello and world both hit in desc
> >     because title is boosted, so 1 has more score than 2.
> >     But we may think 2 is better than 1 because hello world is a phrase.
> > But we don't want to use phrase query because it's too strict that the
> > recall can meet our needs.
> >    Our solution is modify lucene so boolean scorer can tell us which term
> > is matched. then we use our own collector to boost scenario 1. This
> > solution need modify lucene(I have posted a mail and you can patch your
> > DisjunctionSumScorer with
> > https://issues.apache.org/jira/browse/LUCENE-2686)
> >    Another solution I can come up with is using complicated query:
> >    +(title:hello desc:hello) +(title:world desc:hello)
> >    (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
> >    The must occurrence condition is the same as before. but if hello
> world
> > are all in title, we give it a boost. similarly, if hello world are all
> in
> > desc, we also boost it.
> >
> >
> >
> > On Fri, Apr 27, 2012 at 3:12 AM, Akos Tajti <akos.tajti@gmail.com>
> wrote:
> >
> >> Dear List,
> >>
> >> we've been struggling the following problem for a while:
> >> we have two fields: title and description. Title is generated from short
> >> summaries while description is generated fromlong texts. We want to
> search
> >> on both fields at the same time but we'd like to get all documents in
> >> which
> >> the title matches the search term before all others. For multi term
> >> queries
> >> we want to achieve the following: all documents that contain all terms
> in
> >> their title must come before every other document, no matter how many
> >> times
> >> the description matches the query. Is there a simple way to achieve
> this?
> >>
> >> Thanks in advance,
> >> Ákos Tajti
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message