lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <burtona...@gmail.com>
Subject Re: NGram Language Categorization Source
Date Sun, 21 Aug 2005 20:48:15 GMT
> * A Nutch implementation:
> http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageidentifier/
> 
> * A Lucene patch: http://issues.apache.org/bugzilla/show_bug.cgi?id=26763

A step in the right direction. It doesn't have other language
categories created though.

> * JTextCat (http://www.jedi.be/JTextCat/index.html),  a Java wrapper
> for libtextcat

Yes. I saw JTextCat.. I didn't want any JNI used. 

> * NGramJ (http://ngramj.sourceforge.net/), a general n-gram Java library

LGPL.. yuk. That said I think I reviewed this package and found it
lacking.  I started off just trying to find a library to use in our
crawler but never found anything.  Which is why I ended up writing my
own.

> Of these, the Nutch one is certainly under active development, the
> others don't seem to be as far as I can tell.

They should just use ngramcat :)

Kevin

-- 
 Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://www.feedblog.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message