lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Absolute term position in scoring
Date Mon, 26 Jan 2015 14:06:38 GMT
A custom query could improve on the situation by not pulling multiple
docs/positions enum for a single term.  E.g. the patch on
https://issues.apache.org/jira/browse/LUCENE-5288 (which never got
committed: too controversial) has such a query, letting you customize
how positions are scored for boolean term query matches.  Maybe you
could start from it and see how performance compares vs the
SpanFirstQuery approach...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 26, 2015 at 6:14 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
> it depends on the query structure. In fact, SpanFirstQuery is slow (all span queries
are slow because of position use, this may improve in the future).
>
> You question was about using multiple fields - in fact querying for the same terms on
multiple fields and/or different query types: This is the standard approach to tune the relevance!
But it always has a cost. In most cases you will not see a large difference (unless you use
phrase or span queries). A very good explanation what can be done using this is described
in the Elasticsearch Guide: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-field-search.html
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Alexey Morozov [mailto:morozov@gmail.com]
>> Sent: Monday, January 26, 2015 11:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Absolute term position in scoring
>>
>> Hello!
>>
>> I'd like to ask if this approach: construct a complex query consisting of a
>> boosted "specialized" part and an "ordinary" part with no boost, - doesn't
>> [necessarily] cause a significant performance degradation compared to a
>> "custom query", specialized for a particular need.
>>
>> Thanks in advance,
>> Alexey Morozov
>>
>> 26.01.2015 14:57, Michael McCandless пишет:
>> > Well you could have ordinary term queries, and then a SHOULD
>> > SpanFirstQuery clause with a boost, to give higher scores to those
>> > docs that also had the
>> > term(s) close to the start of the document.
>> >
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <lastrasl@us.ibm.com>
>> wrote:
>> >
>> >> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> >> going with that. I was still hoping that we could affect the scoring
>> >> formula with the position itself, but maybe this is not feasible.
>> >>
>> >> Luis
>> >>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> Analytics,
>> >>    IBM Watson*
>> >>    *Member of the iBM Academy of Technology*
>> >>
>> >> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >> <email@region.ibm.com>
>> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>> >>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>
>> >>
>> >>
>> >>
>> >>    <http://www.facebook.com/ibmwatson>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> [image: Inactive hide details for Michael McCandless ---01/25/2015
>> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael
>> >> McCandless
>> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>> >>
>> >> From: Michael McCandless <lucene@mikemccandless.com>
>> >> To: Lucene Users <java-user@lucene.apache.org>
>> >> Date: 01/25/2015 08:12 AM
>> >> Subject: Re: Absolute term position in scoring
>> >> ------------------------------
>> >>
>> >>
>> >>
>> >> Maybe SpanFirstQuery?
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <lastrasl@us.ibm.com>
>> >> wrote:
>> >>
>> >>> Is it possible to incorporate in Lucene's scoring function the
>> >>> position
>> >> of
>> >>> a matching term (say as measured from the top of the document). The
>> >>> scenario is, if the set of documents tend to lk about the most
>> >>> important stuff at the beginning of the document, then we would like
>> >>> to give preference to documents that mention a term close to the top.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Luis
>> >>>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> >> Analytics,
>> >>>    IBM Watson*
>> >>>    *Member of the iBM Academy of Technology*
>> >>>
>> >>> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >>> <email@region.ibm.com
>> >>>
>> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-
>> 1879>
>> >>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>>
>> >>>
>> >>>
>> >>>    <http://www.facebook.com/ibmwatson>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message