lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Possible to "quickly" fetch count of other terms based on a query
Date Fri, 22 Feb 2013 11:20:34 GMT
On Fri, Feb 22, 2013 at 6:08 AM, Lars-Erik Aabech <LEA@markedspartner.no> wrote:
> Thanks.
>
> ANDing was what I ment with "combined" queries.
> I think I'll go with that one for now and see how it performs. Not too many docs/terms
in the index. (~1500/30)
>
> Bit sets sounds appealing, but I've got no idea how to go about it. :)
> In "lucene in action", I only find a short mention of DocIdBitSet.
> Any hints?

You can just create a FixedBitSet of size maxDoc(), and then call
.or(DocsEnum) which you got for each term, to get the bitset for each
term.

For a Query, it's a bit trickier: you need to pull its Weight, and
then pull a Scorer from that, and then create a FixedBitSet and call
.or(Scorer) to set all bits.

Then you can .and these bitsets together and call .cardinality to get
total bits set.

To get best perf, you should do this per-segment (ie, iterate over
IR.leaves(), and do the code above per-segment), but for
easiest-to-write code, you can operate on the top-level reader by
wrapping your IR in SlowCompositeReaderWrapper).

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message