Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <033601d03959$4c2ffbb0$e48ff310$@thetaphi.de>
References: 
 <OFF475C53C.E1DA866D-ON85257DD8.000DE87A-85257DD8.000E3FA0@us.ibm.com>
 <CAL8PwkZM316bYTiK4X1g72_Tg8g8pZ0mMGPCJ9sXsO73O6Z=Xg@mail.gmail.com>
 <OF32E3D150.793FD0B0-ON85257DD8.007CBD3A-85257DD8.007D0786@us.ibm.com>
 <CAL8Pwkbb9F-LYtovDTVz-h6Od++VS6kuwcmYtp-P4uJ3xsuapg@mail.gmail.com>
 <54C61BB1.5090304@gmail.com> <033601d03959$4c2ffbb0$e48ff310$@thetaphi.de>
From: Michael McCandless <lucene@mikemccandless.com>
Date: Mon, 26 Jan 2015 09:06:38 -0500
Message-ID: 
 <CAL8PwkY526WKT_h3utxoQpFWjuH9zEBBhNs7RLihvEbEqMKAMg@mail.gmail.com>
Subject: Re: Absolute term position in scoring
To: Lucene Users <java-user@lucene.apache.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

A custom query could improve on the situation by not pulling multiple
docs/positions enum for a single term.  E.g. the patch on
https://issues.apache.org/jira/browse/LUCENE-5288 (which never got
committed: too controversial) has such a query, letting you customize
how positions are scored for boolean term query matches.  Maybe you
could start from it and see how performance compares vs the
SpanFirstQuery approach...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 26, 2015 at 6:14 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
> it depends on the query structure. In fact, SpanFirstQuery is slow (all s=
pan queries are slow because of position use, this may improve in the futur=
e).
>
> You question was about using multiple fields - in fact querying for the s=
ame terms on multiple fields and/or different query types: This is the stan=
dard approach to tune the relevance! But it always has a cost. In most case=
s you will not see a large difference (unless you use phrase or span querie=
s). A very good explanation what can be done using this is described in the=
 Elasticsearch Guide: http://www.elasticsearch.org/guide/en/elasticsearch/g=
uide/current/multi-field-search.html
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Alexey Morozov [mailto:morozov@gmail.com]
>> Sent: Monday, January 26, 2015 11:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Absolute term position in scoring
>>
>> Hello!
>>
>> I'd like to ask if this approach: construct a complex query consisting o=
f a
>> boosted "specialized" part and an "ordinary" part with no boost, - doesn=
't
>> [necessarily] cause a significant performance degradation compared to a
>> "custom query", specialized for a particular need.
>>
>> Thanks in advance,
>> Alexey Morozov
>>
>> 26.01.2015 14:57, Michael McCandless =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
>> > Well you could have ordinary term queries, and then a SHOULD
>> > SpanFirstQuery clause with a boost, to give higher scores to those
>> > docs that also had the
>> > term(s) close to the start of the document.
>> >
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <lastrasl@us.ibm.com>
>> wrote:
>> >
>> >> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> >> going with that. I was still hoping that we could affect the scoring
>> >> formula with the position itself, but maybe this is not feasible.
>> >>
>> >> Luis
>> >>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> Analytics,
>> >>    IBM Watson*
>> >>    *Member of the iBM Academy of Technology*
>> >>
>> >> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >> <email@region.ibm.com>
>> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-18=
79>
>> >>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>
>> >>
>> >>
>> >>
>> >>    <http://www.facebook.com/ibmwatson>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> [image: Inactive hide details for Michael McCandless ---01/25/2015
>> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael
>> >> McCandless
>> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>> >>
>> >> From: Michael McCandless <lucene@mikemccandless.com>
>> >> To: Lucene Users <java-user@lucene.apache.org>
>> >> Date: 01/25/2015 08:12 AM
>> >> Subject: Re: Absolute term position in scoring
>> >> ------------------------------
>> >>
>> >>
>> >>
>> >> Maybe SpanFirstQuery?
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <lastrasl@us.ibm.com>
>> >> wrote:
>> >>
>> >>> Is it possible to incorporate in Lucene's scoring function the
>> >>> position
>> >> of
>> >>> a matching term (say as measured from the top of the document). The
>> >>> scenario is, if the set of documents tend to lk about the most
>> >>> important stuff at the beginning of the document, then we would like
>> >>> to give preference to documents that mention a term close to the top=
.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Luis
>> >>>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> >> Analytics,
>> >>>    IBM Watson*
>> >>>    *Member of the iBM Academy of Technology*
>> >>>
>> >>> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >>> <email@region.ibm.com
>> >>>
>> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-
>> 1879>
>> >>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>>
>> >>>
>> >>>
>> >>>    <http://www.facebook.com/ibmwatson>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org