Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 35C91200B6D for ; Tue, 9 Aug 2016 00:47:21 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3488E160AB3; Mon, 8 Aug 2016 22:47:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7C1BB160A91 for ; Tue, 9 Aug 2016 00:47:20 +0200 (CEST) Received: (qmail 76638 invoked by uid 500); 8 Aug 2016 22:47:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 76625 invoked by uid 99); 8 Aug 2016 22:47:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Aug 2016 22:47:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8734F1A0598 for ; Mon, 8 Aug 2016 22:47:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.228 X-Spam-Level: X-Spam-Status: No, score=-0.228 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Czcn8cRgOYiS for ; Mon, 8 Aug 2016 22:47:15 +0000 (UTC) Received: from nm18-vm3.bullet.mail.gq1.yahoo.com (nm18-vm3.bullet.mail.gq1.yahoo.com [98.136.217.218]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EA6425FAD8 for ; Mon, 8 Aug 2016 22:47:14 +0000 (UTC) Received: from [216.39.60.180] by nm18.bullet.mail.gq1.yahoo.com with NNFMP; 08 Aug 2016 22:47:07 -0000 Received: from [98.137.12.216] by tm16.bullet.mail.gq1.yahoo.com with NNFMP; 08 Aug 2016 22:47:07 -0000 Received: from [127.0.0.1] by omp1024.mail.gq1.yahoo.com with NNFMP; 08 Aug 2016 22:47:07 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 122358.21423.bm@omp1024.mail.gq1.yahoo.com X-YMail-OSG: ddle0WEVM1ldCggqLh.CKxWEQ6o8G6hy2cK_3SuHvIfgYRRqNCG07SzdRsXfQjy Frf6MPHDm0jeONtNdoGmoHAmb23KYPKe9YdBX7wCrCuUGsXU3bLSD078vlfwo_9wPsXpTchJ6YNX dctZ1g1wwfYVtA7OUA5shF4hfNqI0chfk8DdEtyqp_pjws.tUJQK93gWap_rP4alOts_CYVYPLwc l8Te05bKAMgcJJETC0mXlfAL7yKDlL9eJzL3Jd52z8yfTMoMrFOxKL2k73AITiYOx.FegBtXDd0F klwK0DCxWgGKJccMlLtCSCo3dR1TODBFn9Tn0ld25b_w3P8j___LOKzIIaF01ZkjXYOqhhVDa7iG jtyJqKdsS8JnP3HVUyRwSEHo9KB2QQEQj8hrH9sK9TqG4zqdC5AxsSTIgHm68jk9fwMXzwEC1VEz uCQKk8phggx9vgWx1PIUWN73a7asc1wf9BPmlmrDTfCglTNRxhkVJOPkMQTSoIhepR7SktvQk3EW RzdFUSMiL_2v9ytcqsQ-- Received: from jws10703.mail.gq1.yahoo.com by sendmailws123.mail.gq1.yahoo.com; Mon, 08 Aug 2016 22:47:06 +0000; 1470696426.699 Date: Mon, 8 Aug 2016 22:45:52 +0000 (UTC) From: Spyros Kapnissis Reply-To: Spyros Kapnissis To: "java-user@lucene.apache.org" Message-ID: <1591078353.14343697.1470696352098.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: <240334400.14267196.1470691287094.JavaMail.yahoo.ref@mail.yahoo.com> <240334400.14267196.1470691287094.JavaMail.yahoo@mail.yahoo.com> Subject: Re: BooleanQuery rewrite optimization MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_14343696_392915813.1470696352095" archived-at: Mon, 08 Aug 2016 22:47:21 -0000 ------=_Part_14343696_392915813.1470696352095 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hm, I hadn't really thought about the minShouldMatch part, I thought it' d = be covered but I see your point being semantically different if you keep it= as is. However.. Running your edge case example on an actual local index I get the= following: "(X X Y #X)" w/minshouldmatch=3D2 vs. (+X X Y) w/minshouldmatch=3D2 =3D> sa= me top score, less results in second case."(X X Y #X)" w/minshouldmatch=3D2= vs. (+X X Y) w/minshouldmatch=3D1 =3D> same top score, same number of resu= lts"(X X X Y #X)" w/minshouldmatch=3D3 vs. (+X X X Y) w/minshouldmatch=3D2 = =3D> same top score, same number of results But still not really convinced myself if decrementing minshouldmatch by 1 w= ill do the trick.. I'll have to verify - maybe I'll try more examples to se= e if it holds as a general case.. Nice exercise either way :) =20 On Tuesday, August 9, 2016 12:40 AM, Chris Hostetter wrote: =20 =20 Off the top of my head, i think any optimiation like that would also need= =20 to account for minNrShouldMatch, wouldn't it? if your query is "(X Y Z #X)" w/minshouldmatch=3D2, and you rewrite that=20 query to "(+X Y Z)" w/minshouldmatch=3D2 you now have a semantically diff= =20 query that won't match as many documents as the original. in that example, you could decrement minshouldmatch (=3D1) ... but i'm not= =20 sure off that holds as a general rule for all possible permutations/values= =20 ... i'd have to think about it. An interesting edge case to think about is "(X X Y #X)" w/minshouldmatch=3D= 2=20 ... pretty sure that would give you very diff scores if you rewrote it to= =20 "(+X X Y)" (or "(+X Y)") w/minshouldmatch=3D1 : Hello all, I noticed while debugging a query that BooleanQuery will=20 : rewrite itself to remove FILTER clauses that are also MUST as an=20 : optimization/simplification, which makes total sense. So (+f:x #f:x)=20 : will become (+f:x). However, shouldn't there also be another=20 : optimization to remove FILTER clauses that are also SHOULD, while=20 : converting them to MUST? So, for eg. query (f:x #f:x) will become=20 : (+f:x). I did an initial simple implementation and the tests seem to=20 : pass. Are there any cases where this does not hold?=C2=A0 :=20 :=20 -Hoss http://www.lucidworks.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org ------=_Part_14343696_392915813.1470696352095--