lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Design guidance - search strategy
Date Thu, 04 Dec 2008 22:12:18 GMT
See the class in the docs or Lucene In Action for more
detail, but here's the short form.....

A Filter is a bitset where each bit's ordinal position stands
for a document. I.e. bit 1 means doc id 1, bit 519
represents document 519 etc.

When you pass a filter to one of the search routines that accepts a Filter,
the search is restricted to ONLY those documents with the corresponding
bit set in the Filter. Which sounds like just what you want, but I may be
wrong.

You construct a Filter by iterating over the terms you care about (see
TermDocs/TermEnum classes).  In a nutshell you find the terms you
care about, and for each document that contains that term, set the
bit in your filter. This is actually much faster than you might think. All
this is provided for you in the two classes above, how you use them
depends (tm). Watch that you don't run past the term you care about, I
remember having that happen in one of those classes but the details
escape my aging memory.

Hope that helps
Erick

On Thu, Dec 4, 2008 at 4:20 PM, Ian Vink <ianvink@gmail.com> wrote:

> So, let me get this straight.  :)
>
> A Query tells Lucene what to search for. Then a Filter tells lucene what?
>
> I think I'm missing understanding about what a Filter is for.
>
> Ian
>
>
>
> On Thu, Dec 4, 2008 at 9:36 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > It's generally a bad idea to iterate a Hits object. In fact, Hits
> > is deprecated in recent versions of Lucene. The underlying
> > problem is that the query is re-executed every 100 responses
> > or so.
> >
> > First suggestion, create a Filter by iterating over your
> > docid field and use that in your searches see
> > several of the Searcher.search variants.
> >
> > Second suggestion, use one of the collector classes rather than
> > Hits, e.g. TopDoc*, TopFieldDoc*, whichever suits.
> >
> >
> > Best
> > Erick
> >
> > On Thu, Dec 4, 2008 at 7:59 AM, Ian Vink <ianvink@gmail.com> wrote:
> >
> > > I have documents with this simple schema in Lucene which I can not
> > change.
> > > docid: (int)
> > > contents: (text)
> > >
> > > The user is given a list of 10,000 documents in a tree which they
> select
> > to
> > > search, usually they select 5000 or so.
> > >
> > > I only want to search those 5000 documents. I have the 'id' fields.
> That
> > is
> > > all.
> > >
> > > I do this now:
> > >
> > > Get the 'Hits' for all documents.
> > > Loop through all Hits looking for any 'docid' that is in the 5000
> > selected
> > > by the user
> > > Add found docs to a collection of found documents and return that to
> the
> > > UI.
> > >
> > >
> > > Is there a better way of doing this?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message