lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.dnsalias.net>
Subject Re: N-gram layer
Date Tue, 03 Feb 2004 08:39:41 GMT
On Tue, 03 Feb 2004 09:27:25 +0100
Andrzej Bialecki <ab@getopt.org> wrote:

> 
> A question: what was your source for the representative hi-frequency 
> words in various languages? Was it your training corpus or some publication?

I use the data supplied with Gertjan van Noord:s TextCat distribution.

http://odur.let.rug.nl/~vannoord/TextCat/


-- 

karl

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message