lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "René Hackl" <rene.a.ha...@gmx.de>
Subject Re: Top most frequent words
Date Thu, 12 May 2005 08:54:57 GMT
Hi John,

> >from a slightly skewed source -- newspapers in a fixed interval 
> perhaps.  (I don't think "Los Angeles" makes it into every day parlance 

You're right there. Most possibly the frequencies in that list are based on
a volume of the Los Angeles Times, that's one of the standard
CLEF-Collections. The Glasgow Herald is the second standard collection, here
"Scotland" and "scottish" are among the most frequent terms :-)

Maybe the "officials" come from sports articles --> LA --> LA Lakers?!

Cheers,
René

-- 
+++ Neu: Echte DSL-Flatrates von GMX - Surfen ohne Limits +++
Always online ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message