lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <cowtownco...@yahoo.com>
Subject Re: Phrase IDF and collection frequency !
Date Wed, 17 May 2006 00:19:25 GMT
--- ABDOU Samir <samir.abdou@unine.ch> wrote:

> Hi,
>  
> Are there any ideas on how to compute the "document
> frequency" and "collection frequency" of phrases?

Tokenize your input as phrases (instead of words), and
you'll get this the same way you normally get stats
for single-word tokens (Terms)? I did that for bigram
frequency analysis.

Of course, the problem is hardly getting these stats,
problem is finding what constitutes a phrase. ;-)

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message