lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: EarlyTerminatingSortingCollector help needed..
Date Sat, 21 Jun 2014 13:11:33 GMT
Hi Ravikumar,

On Fri, Jun 20, 2014 at 12:14 PM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> If my "numDocsToCollect" = 50 and no.of. segments = 15, then
> collector.collect() will be called 750 times.

That is the worst-case indeed. However if some of your segments have
less than 50 matches, `collect` will only be called on those matches.

> When I use a SortField instead, then TopFieldDocs does the sorting for all
> segments and collector.collect() will be called only 50 times...

What do you mean by "When I use a SortField instead"? Unless you are
using early termination, Collector.collect is supposed to be called
for every matching document.

> Assuming a stored-field seek for every collector.collect(), will it be
> advisable to still persist with ETSC? Was it introduced as a trade-off b/n
> memory & disk?

I would not advise to use the stored fields API, even in the context
of early termination. Doc values should be more efficient here?

The trade-off is not really about memory and disk. What it tries to
achieve is to make queries much faster provided that:
 - you can afford the merging overhead (ie. for heavy indexing
workloads, this might not be the best solution)
 - there is a single sort order that is used for most queries
 - you don't need any feature that requires to collect all documents
(like computing the total hit count or facets).

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message