lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen GRAY" <stephen.g...@immi.gov.au>
Subject RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]
Date Tue, 05 Nov 2013 05:27:41 GMT
UNOFFICIAL

Hi Mike,

The hits do seem to come back in docId order. I don't know if they do that every time though.
Might be best to sort them.

Compiling statistics in the collector sounds like a good idea. I might do that.

Thanks,
Steve

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Monday, 4 November 2013 9:49 PM
To: Lucene Users
Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

On Sun, Nov 3, 2013 at 7:59 PM, Stephen GRAY <stephen.gray@immi.gov.au> wrote:
> UNOFFICIAL
>
> Hi Mike,
>
> I ran it again and this time the two methods came out about the same: 168 - 288 ms to
process 173,000 documents for the walking method and 160 - 205 ms for the MultiDocValues method
. I don't know what was happening with my last test.

Hmm, still curious.  But it could simply be that the per-doc binary search is in the noise...

> Here is my code:

The code looks correct, but are you certain the hits come back in docID order?  Are you sorting
by (SortField.FIELD_DOC)?

> Thanks for the tip on using a custom Collector. This is in Lucene in Action (great book
by the way).

I'm glad to hear that, thanks!

Another option is to fold this processing (looking up the NDV value for the doc and then doing
something) into your Collector: it's already told whenever it's switching to a new reader,
so you'd lookup your NDV instance there, and then in collect(int doc), do your processing.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


UNOFFICIAL


--------------------------------------------------------------------
Important Notice: If you have received this email by mistake, please advise
the sender and delete the message and attachments immediately.  This email,
including attachments, may contain confidential, sensitive, legally privileged
and/or copyright information.  Any review, retransmission, dissemination
or other use of this information by persons or entities other than the
intended recipient is prohibited.  DIBP respects your privacy and has
obligations under the Privacy Act 1988.  The official departmental privacy
policy can be viewed on the department's website at www.immi.gov.au.  See:
http://www.immi.gov.au/functional/privacy.htm


---------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message