Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BBFA91729B for ; Mon, 26 Jan 2015 14:08:35 +0000 (UTC) Received: (qmail 30923 invoked by uid 500); 26 Jan 2015 14:08:34 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30861 invoked by uid 500); 26 Jan 2015 14:08:34 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30833 invoked by uid 99); 26 Jan 2015 14:08:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 14:08:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.215.43] (HELO mail-la0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2015 14:08:25 +0000 Received: by mail-la0-f43.google.com with SMTP id q1so7810508lam.2 for ; Mon, 26 Jan 2015 06:06:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=l0fX3j6gcLXrXQj9Ijoxmbg2ay5C5LvdQKIV5QEFjbs=; b=L7srKkOEhbdMkyelWdxyLveDPTh6yb1ZNw4tuotfy0fKlGW3OT/xLw8gr47C1ctMFf 7PfyIXgorcwCj+MMlLQQzRZyi4V73Bc9q0SIiihAsK/hdWGp6hKF11r4gSptPfSJPEAE 1bPavYyivPG/mXlUbYHT9I4O2fqbh3IVhOl29zZLCkmOK07UOguZXgPT0NiP42mzuew+ /ghOC1GJYlDSW5XNATEsO4L5DBSPvml6YIaTfVSwPBYKXmpcuBZgPD5ByOXEZaob5MKW ouTXbnZ59OasxTAb6g1gvEjgli8lP/fWlnU0W/zPsLcK9ad7R4YuDFKihk3BME0JzWoG jBTQ== X-Gm-Message-State: ALoCoQnx+TCJT6U06xa0AGuZHJ0dJJnKN1YQj2AIUYmz/hV/kiazABElR6LizPWHG9mh1snGfGkF X-Received: by 10.152.27.228 with SMTP id w4mr21253481lag.75.1422281218629; Mon, 26 Jan 2015 06:06:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.37.147 with HTTP; Mon, 26 Jan 2015 06:06:38 -0800 (PST) In-Reply-To: <033601d03959$4c2ffbb0$e48ff310$@thetaphi.de> References: <54C61BB1.5090304@gmail.com> <033601d03959$4c2ffbb0$e48ff310$@thetaphi.de> From: Michael McCandless Date: Mon, 26 Jan 2015 09:06:38 -0500 Message-ID: Subject: Re: Absolute term position in scoring To: Lucene Users Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org A custom query could improve on the situation by not pulling multiple docs/positions enum for a single term. E.g. the patch on https://issues.apache.org/jira/browse/LUCENE-5288 (which never got committed: too controversial) has such a query, letting you customize how positions are scored for boolean term query matches. Maybe you could start from it and see how performance compares vs the SpanFirstQuery approach... Mike McCandless http://blog.mikemccandless.com On Mon, Jan 26, 2015 at 6:14 AM, Uwe Schindler wrote: > Hi, > > it depends on the query structure. In fact, SpanFirstQuery is slow (all s= pan queries are slow because of position use, this may improve in the futur= e). > > You question was about using multiple fields - in fact querying for the s= ame terms on multiple fields and/or different query types: This is the stan= dard approach to tune the relevance! But it always has a cost. In most case= s you will not see a large difference (unless you use phrase or span querie= s). A very good explanation what can be done using this is described in the= Elasticsearch Guide: http://www.elasticsearch.org/guide/en/elasticsearch/g= uide/current/multi-field-search.html > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > >> -----Original Message----- >> From: Alexey Morozov [mailto:morozov@gmail.com] >> Sent: Monday, January 26, 2015 11:49 AM >> To: java-user@lucene.apache.org >> Subject: Re: Absolute term position in scoring >> >> Hello! >> >> I'd like to ask if this approach: construct a complex query consisting o= f a >> boosted "specialized" part and an "ordinary" part with no boost, - doesn= 't >> [necessarily] cause a significant performance degradation compared to a >> "custom query", specialized for a particular need. >> >> Thanks in advance, >> Alexey Morozov >> >> 26.01.2015 14:57, Michael McCandless =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> > Well you could have ordinary term queries, and then a SHOULD >> > SpanFirstQuery clause with a boost, to give higher scores to those >> > docs that also had the >> > term(s) close to the start of the document. >> > >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras >> wrote: >> > >> >> Thanks I didn't know about SpanFirstQuery. I can likely get something >> >> going with that. I was still hoping that we could affect the scoring >> >> formula with the position itself, but maybe this is not feasible. >> >> >> >> Luis >> >> >> >> >> >> >> >> ------------------------------ >> >> >> >> >> >> >> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept >> Analytics, >> >> IBM Watson* >> >> *Member of the iBM Academy of Technology* >> >> >> >> *IBM Master Inventor email: **lastrasl@us.ibm.com* >> >> >> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-18= 79> >> >> address: 1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, >> >> 10598* >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------ >> >> >> >> >> >> >> >> [image: Inactive hide details for Michael McCandless ---01/25/2015 >> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael >> >> McCandless >> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless >> >> >> >> From: Michael McCandless >> >> To: Lucene Users >> >> Date: 01/25/2015 08:12 AM >> >> Subject: Re: Absolute term position in scoring >> >> ------------------------------ >> >> >> >> >> >> >> >> Maybe SpanFirstQuery? >> >> >> >> >> >> Mike McCandless >> >> >> >> http://blog.mikemccandless.com >> >> >> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras >> >> wrote: >> >> >> >>> Is it possible to incorporate in Lucene's scoring function the >> >>> position >> >> of >> >>> a matching term (say as measured from the top of the document). The >> >>> scenario is, if the set of documents tend to lk about the most >> >>> important stuff at the beginning of the document, then we would like >> >>> to give preference to documents that mention a term close to the top= . >> >>> >> >>> Thanks, >> >>> >> >>> Luis >> >>> >> >>> >> >>> >> >>> ------------------------------ >> >>> >> >>> >> >>> >> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept >> >> Analytics, >> >>> IBM Watson* >> >>> *Member of the iBM Academy of Technology* >> >>> >> >>> *IBM Master Inventor email: **lastrasl@us.ibm.com* >> >>> > >>> >> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382- >> 1879> >> >>> address: 1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, >> >> 10598* >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ------------------------------ >> >>> >> >>> >> >>> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org