lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Francois Halleux" <halleux...@skynet.be>
Subject Some results for the language guesser
Date Mon, 16 Feb 2004 20:50:07 GMT
Hello,

	I found some time to do some qualitative testing with the language guesser
I contributed some time ago (available in the patch queue :)

I tried with language references for da, de, en, fr, nl, sv. I picked at
random strings of varying length from a reference document in a specific
language and measured the probability to get it right. Here are some
results.

For French
----------

Length:Probability X 10000

30:9954 (means for String length 30, 99.54% chance that it returns French)
25:9926
20:9890
15:9789
10:9426
9:9209
8:9032
7:8852
6:8544
5:8085
4:7585
3:6732

For English
-----------

30:9960
25:9929
20:9848
10:8983
9:8801
8:8557
7:8240
6:7853
5:7356
4:6523
3:5733

For Danish
----------

30:9854
25:9853
20:9813
15:9664
10:9086
9:8924
8:8738
7:8340
6:7878
5:7374
4:6489
3:5630

For German
----------

30:9935
25:9922
20:9868
15:9715
10:9281
9:9117
8:8921
7:8582
6:8123
5:7545
4:6666
3:5568


Have fun,

Jean-Francois Halleux


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message