lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Vink" <ianv...@gmail.com>
Subject Re: Design guidance - search strategy
Date Fri, 05 Dec 2008 02:02:15 GMT
I bought your book :)
Thanks, I will look into it.

On Thu, Dec 4, 2008 at 6:12 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> See the class in the docs or Lucene In Action for more
> detail, but here's the short form.....
>
> A Filter is a bitset where each bit's ordinal position stands
> for a document. I.e. bit 1 means doc id 1, bit 519
> represents document 519 etc.
>
> When you pass a filter to one of the search routines that accepts a Filter,
> the search is restricted to ONLY those documents with the corresponding
> bit set in the Filter. Which sounds like just what you want, but I may be
> wrong.
>
> You construct a Filter by iterating over the terms you care about (see
> TermDocs/TermEnum classes).  In a nutshell you find the terms you
> care about, and for each document that contains that term, set the
> bit in your filter. This is actually much faster than you might think. All
> this is provided for you in the two classes above, how you use them
> depends (tm). Watch that you don't run past the term you care about, I
> remember having that happen in one of those classes but the details
> escape my aging memory.
>
> Hope that helps
> Erick
>
> On Thu, Dec 4, 2008 at 4:20 PM, Ian Vink <ianvink@gmail.com> wrote:
>
> > So, let me get this straight.  :)
> >
> > A Query tells Lucene what to search for. Then a Filter tells lucene what?
> >
> > I think I'm missing understanding about what a Filter is for.
> >
> > Ian
> >
> >
> >
> > On Thu, Dec 4, 2008 at 9:36 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > It's generally a bad idea to iterate a Hits object. In fact, Hits
> > > is deprecated in recent versions of Lucene. The underlying
> > > problem is that the query is re-executed every 100 responses
> > > or so.
> > >
> > > First suggestion, create a Filter by iterating over your
> > > docid field and use that in your searches see
> > > several of the Searcher.search variants.
> > >
> > > Second suggestion, use one of the collector classes rather than
> > > Hits, e.g. TopDoc*, TopFieldDoc*, whichever suits.
> > >
> > >
> > > Best
> > > Erick
> > >
> > > On Thu, Dec 4, 2008 at 7:59 AM, Ian Vink <ianvink@gmail.com> wrote:
> > >
> > > > I have documents with this simple schema in Lucene which I can not
> > > change.
> > > > docid: (int)
> > > > contents: (text)
> > > >
> > > > The user is given a list of 10,000 documents in a tree which they
> > select
> > > to
> > > > search, usually they select 5000 or so.
> > > >
> > > > I only want to search those 5000 documents. I have the 'id' fields.
> > That
> > > is
> > > > all.
> > > >
> > > > I do this now:
> > > >
> > > > Get the 'Hits' for all documents.
> > > > Loop through all Hits looking for any 'docid' that is in the 5000
> > > selected
> > > > by the user
> > > > Add found docs to a collection of found documents and return that to
> > the
> > > > UI.
> > > >
> > > >
> > > > Is there a better way of doing this?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message