lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars-Erik Aabech <...@markedspartner.no>
Subject RE: Possible to "quickly" fetch count of other terms based on a query
Date Fri, 22 Feb 2013 11:22:29 GMT
I guess it performs alright :P
	Overall Elapsed:	00:00:00.0290029

(29ms)

Lars-Erik

-----Original Message-----
From: Lars-Erik Aabech [mailto:LEA@markedspartner.no] 
Sent: 22. februar 2013 12:09
To: java-user@lucene.apache.org
Subject: RE: Possible to "quickly" fetch count of other terms based on a query

Thanks.

ANDing was what I ment with "combined" queries.
I think I'll go with that one for now and see how it performs. Not too many docs/terms in
the index. (~1500/30)

Bit sets sounds appealing, but I've got no idea how to go about it. :) In "lucene in action",
I only find a short mention of DocIdBitSet.
Any hints?

Lars-Erik

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: 22. februar 2013 11:27
To: java-user@lucene.apache.org
Subject: Re: Possible to "quickly" fetch count of other terms based on a query

For terms that are in your query, you could use the Scorer.getChildScorers API up front to
hold onto each Scorer and then in a custom collector check if that Scorer matched this particular
hit.

For terms that are not in your query.....:

You could use term vectors and count up the terms yourself as you go (in a custom collector),
but that'd be insanely slow.

You could create a bit set of all matching docs, and then a bit set for each of the terms
of interest, and intersect them and count the set bits.

You could pull the DocsEnum for each term of interest up front, and then in a custom collector
call .advance on each, for each collected docID, and increment counts if that term matches
that doc.

Or you could just do a separate query for each of the terms of interest AND'd with your original
query.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Feb 22, 2013 at 4:14 AM, Lars-Erik Aabech <LEA@markedspartner.no> wrote:
> Hi!
>
> I'm sorry I didn't do any hard research on this, it's so quick to ask. 
> ;)
>
> Is it possible to somehow find the count of each term in a set for each document returned
by a query?
>
> For instance, if I use the query +(foo:bar foo:morebar) +(bar:foo), 
> Could I without fetching all the documents from this query, find the count of occurances
of the terms [barette, fooish, bar, morebar, foo]?
> The result I'm after is something like
> barette: 10,
> fooish: 0,
> bar: 5,
> morebar: 8
> foo: 3
>
> Hope the question is clear enough.
> Any suggestion is welcome.
> I'd prefer not having to build a second index, though..
>
> (I guess I could do a new "combined" query for each term in the set, 
> but if any other way it'd be nice)
>
> mvh.
> Lars-Erik Aabech
> Faglig leder utvikling
> MarkedsPartner AS
> Mobil: +47 920 30 537
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message