lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Phrase IDF and collection frequency !
Date Wed, 17 May 2006 00:19:25 GMT
--- ABDOU Samir <> wrote:

> Hi,
> Are there any ideas on how to compute the "document
> frequency" and "collection frequency" of phrases?

Tokenize your input as phrases (instead of words), and
you'll get this the same way you normally get stats
for single-word tokens (Terms)? I did that for bigram
frequency analysis.

Of course, the problem is hardly getting these stats,
problem is finding what constitutes a phrase. ;-)

-+ Tatu +-

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message