lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Curtin" <m...@curtin.com>
Subject Re: Custom Sorting
Date Mon, 20 Feb 2006 15:21:43 GMT
SOME ONE wrote:

> Hi,
> 
> Yes, my queries are like the first case. And as there
> have been no other suggestions to do it in a single
> search operation, will have to do it the way you
> suggested. This technique will do the job particularly
> because title's text is always in the body as well. So
> finally I will have to run two search operations like
> 
> (body:a AND body:b AND body:c) AND
> (title:a OR title:b OR title:c)
> 
> to get the first group of results for title match, and
> 
> (body:a AND body:b AND body:c) NOT
> (title:a OR title:b OR title:c)
> 
> to get the second group of results with just body
> match, and sort each group by date.

Here's another way.  Search just the first part of the first query, i.e. 
(body:a AND body:b AND body:c).  Then use an IndexReader, TermDocs and related 
objects to get a list of document numbers for each of title:a, title:b, and 
title:c.  Then merge / compare the lists to separate out the title matches and 
the body-only matches.  Doing it this way will save whatever resources are 
consumed by score computations.  Resources needed by the ORing will still 
basically be consumed, because of the merge / compare step you must perform.

Of course, this approach wouldn't work if the query terms aren't simple terms, 
i.e. you use ranges, or wildcards, or prefixes, etc.  If you do use constructs 
like those, then I don't see an alternative to the two-search algorithm. 
Unless you have hundreds of millions of documents, each of the two searches 
could be simpler than what you have above, though (see my earlier email for 
details).  The way they're set up above, 2 sets of reads through the indexes 
are necessary, and 2 sets of boolean operations.  The way I suggested earlier 
would have only 1 read of the index and 1 merge / compare computation.  if you 
have a few tens of millions of documents, or less, the lists of document 
numbers should easily fit in RAM.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message