Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 34729 invoked from network); 26 Jul 2009 20:53:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jul 2009 20:53:11 -0000 Received: (qmail 57685 invoked by uid 500); 26 Jul 2009 20:54:15 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 57600 invoked by uid 500); 26 Jul 2009 20:54:15 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 57590 invoked by uid 99); 26 Jul 2009 20:54:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2009 20:54:15 +0000 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=PLING_QUERY,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2009 20:54:06 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MVAj3-0000pv-Ta for general@lucene.apache.org; Sun, 26 Jul 2009 13:53:45 -0700 Message-ID: <24670376.post@talk.nabble.com> Date: Sun, 26 Jul 2009 13:53:45 -0700 (PDT) From: Edoardo Marcora To: general@lucene.apache.org Subject: Re: Boolean query with 50,000 clauses! Possible? Scalable? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: edoardo.marcora@gmail.com References: <24664839.post@talk.nabble.com> <4A6CA0C5.2070209@ice-sa.com> X-Virus-Checked: Checked by ClamAV on apache.org No logical structure really. See my reply to awarnier above. Ted Dunning wrote: >=20 > To put a bit more meat on this question, it is often possible to find > structure in the term space that would allow you to do a much simpler > query > by using a much smaller number of more general covering terms. >=20 > A great example of this is in numeric queries, especially using the Trie > based range queries in 2.9. We know that numbers have the structure of a > completely ordered set. This means that a numeric field can be translate= d > into multiple fields at differing levels of resolution where each value i= n > the additional fields covers many values in the original. A range query > can > be translated into some small number of terms in the low resolution field= s > and a few residual terms in the higher resolution fields. The resulting > query can have multiple orders of magnitude fewer terms. >=20 > So is there corresponding logical structure in your 50,000 terms? >=20 > On Sun, Jul 26, 2009 at 11:30 AM, Andr=C3=A9 Warnier wrot= e: >=20 >> Edoardo Marcora wrote: >> >>> I am faced with the requirement for a boolean query composed of 50,000 >>> clauses (all of them directed at the same field) all OR'ed together. >>> >> By pure intellectual curiosity : can you provide some idea of the type o= f >> query, and the type of content of the field this is targeted at ? >> I have this notion that with 50,000 queries directed at one field, there >> must be some smarter way of handling this than just OR-ing together the >> results. >> >=20 >=20 >=20 > --=20 > Ted Dunning, CTO > DeepDyve >=20 >=20 --=20 View this message in context: http://www.nabble.com/Boolean-query-with-50%2= C000-clauses%21-Possible--Scalable--tp24664839p24670376.html Sent from the Lucene - General mailing list archive at Nabble.com.