lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]
Date Mon, 04 Nov 2013 10:49:06 GMT
On Sun, Nov 3, 2013 at 7:59 PM, Stephen GRAY <stephen.gray@immi.gov.au> wrote:
> UNOFFICIAL
>
> Hi Mike,
>
> I ran it again and this time the two methods came out about the same: 168 - 288 ms to
process 173,000 documents for the walking method and 160 - 205 ms for the MultiDocValues method
. I don't know what was happening with my last test.

Hmm, still curious.  But it could simply be that the per-doc binary
search is in the noise...

> Here is my code:

The code looks correct, but are you certain the hits come back in
docID order?  Are you sorting by (SortField.FIELD_DOC)?

> Thanks for the tip on using a custom Collector. This is in Lucene in Action (great book
by the way).

I'm glad to hear that, thanks!

Another option is to fold this processing (looking up the NDV value
for the doc and then doing something) into your Collector: it's
already told whenever it's switching to a new reader, so you'd lookup
your NDV instance there, and then in collect(int doc), do your
processing.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message