lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Search result ordering
Date Wed, 29 Apr 2009 20:22:30 GMT
I really doubt boosting at index time will help. All that expresses is
that "this document's title (say) is more important *when calculating score*
than other documents with a smaller title boost".

But since you're not searching on your key (I assume), boosting
at index time would be irrelevant to score calculations that didn't
search on your key. I doubt you want to go there...

So I think you're really back to sorting on your key, I don't know
of any other way to get what you want.

Unless you want to be really adventurous and your index is pretty
static (or at least you can regenerate the whole thing "often enough").
NOTE: this scheme *will not work* if you expect to add to and/or
update your index without rebuilding it from scratch.

In a nutshell, pre-sort your corpus and add the docs in the order
of your key. Now for your index, the following holds true:

if (key1 < key2) then (id1 < id2).

where id is the Lucene-assigned doc ID.

You can then sort the documents by INDEXORDER (see the Sort
API in the javadocs), which should, at least, be very fast and use
a minimal amount of memory. Whether this is enough of a gain
over just sorting on your key to justify the complexity I leave
as an exercise for the reader <G>.

I should stress, though, that it won't work if you alter your index
*at all* after you build it. Updating a document is really a delete/add
so after updating, that document would be at the end every time.
Similarly if you add new documents, each and every one will
be after the last document in your original index.

I suppose I should also ask whether you're really seeing
performance issues or whether this is theoretical. I'm a big
advocate of avoiding optimizations until you have a reasonable
certainty that you need it. Premature optimization and all that.

Best
Erick

On Wed, Apr 29, 2009 at 3:19 PM, <Bill.Chesky@sungard.com> wrote:

> Thanks Erick,
>
> Basically, the ideal ordering is an alphabetical one based on a String
> value that is known at index creation.  I was just wondering if there was
> anything I could do at index creation time that might help me enforce that
> ordering at query time (without using a Sort).  To be honest, I haven't had
> to deal much w/ scoring in my work w/ Lucene.  Our app just searches based
> on some set of criteria and the results returned are all pretty much equal.
>  However, we want to list them alphabetically so they don't appear in a
> jumbled, seemingly random order.
>
> Your comment about boost and scoring prompted me to read up a bit on it and
> it got me wondering if maybe boost could be used somehow.  E.g. once all the
> docs have been added to the index and I know the order I want, I could go
> back and set the boost for each document accordingly.  But maybe this is a
> naïve or innapriate use of boost.
>
> Thanks for the tip on Hits.  We're using 2.2.0 and should probably upgrade.
>  We're at a point in development, though, where we want to keep new
> variables to a minimum.  Interestingly I don't see much of a difference when
> paging through hits 1-10 vs. hits 300-310.  They all seem to take about the
> same time to evaluate.  I'll try using one of the HitCollectors as you
> suggest to see if it makes a difference.
>
> regards,
>
> --
> Bill Chesky
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, April 29, 2009 1:46 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search result ordering
>
> People (including me) use Lucene to page through results all the time,
> so I'm pretty sure you're OK.
>
> so here's my answers...
> (1) yes.
> (2) Well, the default sort is by score so if you want some other
>     ordering you have to sort.
> (3) You can boost things at index time, but I don't think that's at all
>     relevant. What order are you trying to enforce that you know
>     enough about at index time to specify?
>
> Do note, though, that Hits is deprecated. The problem is that
> Hits was intended to be reasonable *only* when accessing the first
> few documents. If you're paging far into the result set, be aware that
> using Hits will re-execute the entire query every 100 (200?) results you
> throw away. Think about one of the HitCollectors
> (perhaps TopDocCollector) instead.
>
> Best
> Erick
>
>
> On Wed, Apr 29, 2009 at 1:01 PM, <Bill.Chesky@sungard.com> wrote:
>
> > Hello,
> >
> > I have a few questions about the ordering of search results:
> >
> > 1) Given a query, are the Documents contained in the Hits object that is
> > returned by IndexSearcher.search(Query query) guaranteed to be in the
> > same order from one call to the next (assuming the index has not been
> > updated in the meantime)?
> > 2) Assuming I don't use the IndexSearcher.search(Query query, Sort sort)
> > method, is the ordering of Documents in the Hits object predictable at
> > all?
> > 3) Short of using the IndexSearcher.search(Query query, Sort sort), is
> > there any way to influence the ordering of the Documents in the Hits
> > object?  E.g. is there anything that I can do when creating and/or
> > updating the index that will guarantee a certain ordering of results at
> > query time?
> >
> > For what it's worth, we're using Lucene in conjuction w/ a relational
> > database.  Our index has an 'id' field that maps to a row in a
> > relational database table.  Since Lucene queries are so quick we've
> > found performance to be much better if we do a pure Lucene query to find
> > the docs we need then do a simple SQL query with a "where id in (...)"
> > clause.   We've also wrapped this in an interface that implements a
> > relational-like limit/offset functionality.  So it's important to us
> > that at the very least the query returns results in the same order each
> > time.  Ideally, we'd like to order the results on a String field we have
> > on each document.  This actually works, however it slows things down a
> > bit.  This is understandable, course -- and I'm actually very impressed
> > how quickly it does perform -- however, we're trying to squeeze as many
> > cycles as possible out it to give the best user experience.   Just
> > wondering if there is anything else we might try.
> >
> > thanks,
> >
> > Bill
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message