lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Re: NGram Language Categorization Source
Date Sun, 21 Aug 2005 20:48:15 GMT
> * A Nutch implementation:
> * A Lucene patch:

A step in the right direction. It doesn't have other language
categories created though.

> * JTextCat (,  a Java wrapper
> for libtextcat

Yes. I saw JTextCat.. I didn't want any JNI used. 

> * NGramJ (, a general n-gram Java library

LGPL.. yuk. That said I think I reviewed this package and found it
lacking.  I started off just trying to find a library to use in our
crawler but never found anything.  Which is why I ended up writing my

> Of these, the Nutch one is certainly under active development, the
> others don't seem to be as far as I can tell.

They should just use ngramcat :)


 Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message