Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98957 invoked from network); 21 Aug 2005 22:47:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 21 Aug 2005 22:47:03 -0000 Received: (qmail 41943 invoked by uid 500); 21 Aug 2005 22:46:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41696 invoked by uid 500); 21 Aug 2005 22:46:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 52446 invoked by uid 99); 21 Aug 2005 20:48:21 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of burtonator@gmail.com designates 64.233.162.204 as permitted sender) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=QMZXuSRyqG0Ysy9gyV/ncdxM0FpGzcKqrxElU7ToxkOzNIAxfUK65b1x72ndkBmP0BXEIUVgAdby9kdCjuM8JfYP/XyyddzB+GsnrkKJA+lcbFNI1/OxTxtD+3dMk3pGrRWhMPiNxabLhbwuUHzZiSonQ0EMPScMM5L8+H1Vejc= Message-ID: <30c6373b05082113485cd3dfa3@mail.gmail.com> Date: Sun, 21 Aug 2005 13:48:15 -0700 From: Kevin Burton To: java-user@lucene.apache.org Subject: Re: NGram Language Categorization Source In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <30c6373b050819144231447954@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N > * A Nutch implementation: > http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageiden= tifier/ >=20 > * A Lucene patch: http://issues.apache.org/bugzilla/show_bug.cgi?id=3D267= 63 A step in the right direction. It doesn't have other language categories created though. > * JTextCat (http://www.jedi.be/JTextCat/index.html), a Java wrapper > for libtextcat Yes. I saw JTextCat.. I didn't want any JNI used.=20 > * NGramJ (http://ngramj.sourceforge.net/), a general n-gram Java library LGPL.. yuk. That said I think I reviewed this package and found it lacking. I started off just trying to find a library to use in our crawler but never found anything. Which is why I ended up writing my own. > Of these, the Nutch one is certainly under active development, the > others don't seem to be as far as I can tell. They should just use ngramcat :) Kevin --=20 Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://www.feedblog.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org