lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peterlkee...@gmail.com>
Subject Re: Announcement: Lucene powering Monster job search index (Beta)
Date Fri, 16 Mar 2007 19:36:33 GMT
Dan,

The filtering is done in the HitCollector by the bounding box, so the only
hits that get collected are those that match the keywords, the bounding box,
and some Lucene filters (BitSets) (I'm probably overloading the word
'filter' a bit). So, the only hits from the collector that need to be sorted
are those that are roughly within the search radius. When the search radius
gets larger, a new bounding box is computed for that query. Make sense?

Peter

On 3/16/07, Daniel Rosher <rosherd@googlemail.com> wrote:
>
> Hi Peter,
>
> Shouldn't the search perform the euclidean distance during filtering as
> well
> though, otherwise you will obtain perhaps highly relevant hits reported to
> the user outside the range they specified? Particularly as the search
> radius
> gets larger.
>
> Cheers,
> Dan
>
> On 1/28/07, Peter Keegan <peterlkeegan@gmail.com> wrote:
> >
> > Correction:
> > We only do the euclidan computation during sorting. For filtering, a
> > simple
> > bounding box is computed to approximate the radius, and 2 range
> > comparisons
> > are made to exclude documents. Because these comparisons are done
> outside
> > of
> > Lucene as integer comparisons, it is pretty fast. With 13000 results,
> the
> > seach time with distance sort is about 200 msec (compared to 30 ms for a
> > simple non-radius, date-sorted keyword search).
> >
> > Peter
> >
> > On 1/27/07, no spam <mrs.nospam@gmail.com> wrote:
> > >
> > > Isn't this extremely ineffecient to do the euclidean distance twice?
> > > Perhaps not a huge deal if a small search result set.  I at times have
> > > 13,000 results that match my search terms of an index with 1.2 million
> > > docs.
> > >
> > > Can't you do some simple radian math first to ensure it's way out of
> > > bounds,
> > > then do the euclidian distance for the subset within bounds?  I'm
> > > currently
> > > only doing the distance calc once (post hit collector). I don't have
> any
> > > performance numbers with the double vs single distance calc.
> > >
> > > I'm still working out the sort by radius myself.
> > >
> > > Mark
> > >
> > > On 11/3/06, Peter Keegan <peterlkeegan@gmail.com> wrote:
> > > >
> > > > Daniel,
> > > > Yes, this is correct if you happen to be doing a radius search and
> > > sorting
> > > > by mileage.
> > > > Peter
> > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message