lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject EarlyTerminatingSortingCollector help needed..
Date Fri, 20 Jun 2014 10:14:02 GMT
I was planning to use ETSC in-conjunction with SortingMergePolicy and got
stuck.

In ESTC, we have

@Override

 public void collect(int doc) throws IOException {

    in.collect(doc);

    if (++numCollected >= numDocsToCollect) {

      throw new CollectionTerminatedException();

    }

  }

I understand this collector is per-segment. There is one-doubt regarding it.

Since a global-sort ordering is difficult, I collect hits for each segment
& return the final "numDocsToCollect" results using a PQ

If my "numDocsToCollect" = 50 and no.of. segments = 15, then
collector.collect() will be called 750 times.

When I use a SortField instead, then TopFieldDocs does the sorting for all
segments and collector.collect() will be called only 50 times...

Assuming a stored-field seek for every collector.collect(), will it be
advisable to still persist with ETSC? Was it introduced as a trade-off b/n
memory & disk?

Any help is much appreciated

--

Ravi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message