lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Nestel <>
Subject Language recognition
Date Fri, 12 Oct 2001 10:52:31 GMT


this was a thread when lucene was still on Sourceforge.
I've done a rough but working port of the text_cat PERL
script for n-gram based language guessing to Java. If this
is useful, it can be found under

there are javadocs and a jar file. The source code 
is not yet available since I apparently need to reprogram 
a tiny but central class for copyright reasons. This is not
difficult but I'm preparing for two weeks off right now and
so it won't happen soon.

For me it was just kind of an exercise, since I later 
realized that I still have a gap to make it work in my SMIDER
project. If someone has a use for such a system, let me know 
so I can readjust this tasks priority for myself :-) Maybe
one would even consider this a potential part of Lucene?
Then I'd be glad to give that source code to apache.


------------------------------------------ooO---"---Ooo-------------------,                "I hate this game, lets play it
Dr. Frank  Sven  Nestel,, 
Spiele von Doris und Frank, Wolfsstaudenring 32, D-91056 Erlangen,

View raw message