lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Boolean query with 50,000 clauses! Possible? Scalable?
Date Tue, 28 Jul 2009 15:37:27 GMT
> Author documents also have other attributes, for example "Weight". I want
> a
> query that gives every book document authored by people weighing more than
> 200lbs, with the ability of doing faceting and the likes.

For that you should use a RangeQuery, as it is a numeric value, the new
NumericRangeQuery in the no yet released 2.9 can do this very fast. You must
only index the weight using NumericFile/NumericTokenStream.

> Grant Ingersoll-6 wrote:
> >
> > This strikes me as an example of
> > http://people.apache.org/~hossman/#xyproblem
> >    Namely, you've declared the solution you would like, but haven't
> > told us the problem.
> >
> > I highly doubt that double loop is going to scale.  It wouldn't scale
> > in a database, either, so it makes me think we need to take a step
> > back and ask a bit more about the problem you are trying to solve and
> > not the solution.  Can you share more details about it?
> >
> > On Jul 26, 2009, at 6:14 PM, Edoardo Marcora wrote:
> >
> >>
> >> type:foo and type:bar are fields used to represent documents of
> >> different
> >> "kind" (it could be "author" and "book"). field2 and field1 contains
> >> IDs
> >> which I would like to use to join the two "kinds".
> >>
> >>
> >> Ken Krugler wrote:
> >>>
> >>>> awarnier wrote:
> >>>>>
> >>>>> Edoardo Marcora wrote:
> >>>>>> I am faced with the requirement for a boolean query composed
of
> >>>>>> 50,000
> >>>>>> clauses (all of them directed at the same field) all OR'ed
> >>>>>> together.
> >>>>>>
> >>>>> By pure intellectual curiosity : can you provide some idea of the
> >>>>> type
> >>>>> of query, and the type of content of the field this is targeted
> >>>>> at ?
> >>>>> I have this notion that with 50,000 queries directed at one field,
> >>>>> there
> >>>>> must be some smarter way of handling this than just OR-ing
> >>>>> together the
> >>>>> results.
> >>>>>
> >>>>>
> >>>>
> >>>> What I would like to do is to take the results of one query and
> >>>> use one of
> >>>> its fields (not the docid) as an argument to another query (much
> >>>> like a
> >>>> subquery in SQL). For example:
> >>>>
> >>>> type:foo AND (_query_:type:bar AND field2:{field1})
> >>>>
> >>>> This should search for all types of foo and then iterate over the
> >>>> result
> >> set
> >>>> and perform a query for where type is bar and field2 is equal to
> >>>> the value
> >>>> of field1 from each item of the first result set.
> >>>
> >>> This looks like a more like this (MLT) query, where you restrict the
> >>> set to documents that have matching types...though I don't understand
> >>> the type:foo AND type:bar query, unless 'type' is a multi-value
> >>> field.
> >>>
> >>> From what I remember of using MLT support in Lucene a few years back,
> >>> this takes the terms of the target field from the target document,
> >>> tosses out stop words, and then uses some arbitrary limit (e.g. 500)
> >>> for the first N terms used to do the query.
> >>>
> >>> Unless the distribution of terms in the field is heavily skewed, this
> >>> gives you pretty good results. I supposed you could use the N most
> >>> common terms - but your stop word list isn't good, you'll get worse
> >>> results.
> >>>
> >>> In any case, preprocessing the field will speed things up, versus
> >>> doing any analysis/stop word/frequency calculations at query time.
> >>>
> >>> -- Ken
> >>> --
> >>> Ken Krugler
> >>> <http://ken-blog.krugler.org>
> >>> +1 530-265-2225
> >>>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Boolean-query-with-50%2C000-clauses%21-Possible--
> Scalable--tp24664839p24671050.html
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >>
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> > using Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> >
> 
> --
> View this message in context: http://www.nabble.com/Boolean-query-with-
> 50%2C000-clauses%21-Possible--Scalable--tp24664839p24701672.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



Mime
View raw message