lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Frequency of phrase
Date Sun, 26 Feb 2006 02:10:26 GMT

: > If you use a span query then you can get the actual number of phrase
: > instances.
: Thanks, good to know!

Just to clarify, i Doug means you can use the getSpans method of a
SpanNear query, and then count the iterations of next() untill you run

Which is a really good idea I hadn't thought of before.

Another idea that was floating arround on the tip of my brain after the
recent comment about using a PhraseQUery and the idf, is that with a
custom Similarity that impliminted all of it's functions as constants, and
a HitCollector that just recorded the sum of the raw scores, you should
also be able to get the exact number of occurances of an (exact)

You'd need to double check the score calculation to figure out exactly
which constants each of the functions should return -- I think it would
mainly be 1.0f, but the sumOfSquares aspect of the equation may make one
of them more interesting.  The only exception would be tf(float) which (I
think) should be an identity function.

Given other comments I've seen from people I trust that PhraseQueries are
faster then SpanNearQueries, that may be faster then the Span appraoch
Doug alluded too -- but it's also more complicated to get working.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message