lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Possible to "quickly" fetch count of other terms based on a query
Date Fri, 22 Feb 2013 10:27:00 GMT
For terms that are in your query, you could use the
Scorer.getChildScorers API up front to hold onto each Scorer and then
in a custom collector check if that Scorer matched this particular

For terms that are not in your query.....:

You could use term vectors and count up the terms yourself as you go
(in a custom collector), but that'd be insanely slow.

You could create a bit set of all matching docs, and then a bit set
for each of the terms of interest, and intersect them and count the
set bits.

You could pull the DocsEnum for each term of interest up front, and
then in a custom collector call .advance on each, for each collected
docID, and increment counts if that term matches that doc.

Or you could just do a separate query for each of the terms of
interest AND'd with your original query.

Mike McCandless

On Fri, Feb 22, 2013 at 4:14 AM, Lars-Erik Aabech <> wrote:
> Hi!
> I'm sorry I didn't do any hard research on this, it's so quick to ask. ;)
> Is it possible to somehow find the count of each term in a set for each document returned
by a query?
> For instance, if I use the query +(foo:bar foo:morebar) +(bar:foo),
> Could I without fetching all the documents from this query, find the count of occurances
of the terms [barette, fooish, bar, morebar, foo]?
> The result I'm after is something like
> barette: 10,
> fooish: 0,
> bar: 5,
> morebar: 8
> foo: 3
> Hope the question is clear enough.
> Any suggestion is welcome.
> I'd prefer not having to build a second index, though..
> (I guess I could do a new "combined" query for each term in the set, but if any other
way it'd be nice)
> mvh.
> Lars-Erik Aabech
> Faglig leder utvikling
> MarkedsPartner AS
> Mobil: +47 920 30 537

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message