lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Boolean query with 50,000 clauses! Possible? Scalable?
Date Sun, 26 Jul 2009 18:39:21 GMT
To put a bit more meat on this question, it is often possible to find
structure in the term space that would allow you to do a much simpler query
by using a much smaller number of more general covering terms.

A great example of this is in numeric queries, especially using the Trie
based range queries in 2.9.  We know that numbers have the structure of a
completely ordered set.  This means that a numeric field can be translated
into multiple fields at differing levels of resolution where each value in
the additional fields covers many values in the original.  A range query can
be translated into some small number of terms in the low resolution fields
and a few residual terms in the higher resolution fields.  The resulting
query can have multiple orders of magnitude fewer terms.

So is there corresponding logical structure in your 50,000 terms?

On Sun, Jul 26, 2009 at 11:30 AM, André Warnier <aw@ice-sa.com> wrote:

> Edoardo Marcora wrote:
>
>> I am faced with the requirement for a boolean query composed of 50,000
>> clauses (all of them directed at the same field) all OR'ed together.
>>
> By pure intellectual curiosity : can you provide some idea of the type of
> query, and the type of content of the field this is targeted at ?
> I have this notion that with 50,000 queries directed at one field, there
> must be some smarter way of handling this than just OR-ing together the
> results.
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message