Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 37673 invoked from network); 8 Dec 2008 11:31:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Dec 2008 11:31:52 -0000 Received: (qmail 11644 invoked by uid 500); 8 Dec 2008 11:32:04 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 11453 invoked by uid 500); 8 Dec 2008 11:32:02 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 11442 invoked by uid 99); 8 Dec 2008 11:32:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2008 03:32:02 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [134.2.129.75] (HELO penthesilea.sfs.uni-tuebingen.de) (134.2.129.75) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2008 11:31:47 +0000 Received: from [134.2.129.126] (ithaka.sfs.uni-tuebingen.de [134.2.129.126]) by penthesilea.sfs.uni-tuebingen.de (Postfix) with ESMTP id 68AF1C6D6 for ; Mon, 8 Dec 2008 12:31:27 +0100 (MET) Message-ID: <493D058F.8040803@sfs.uni-tuebingen.de> Date: Mon, 08 Dec 2008 12:31:27 +0100 From: Niels Ott User-Agent: Thunderbird 2.0.0.18 (X11/20081125) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: Language recognition References: <0DBCCB475CDE864F8F6086D69BFC5D9F02ADA055@CALLISTO.ntdom.tk.informatik.tu-darmstadt.de> In-Reply-To: <0DBCCB475CDE864F8F6086D69BFC5D9F02ADA055@CALLISTO.ntdom.tk.informatik.tu-darmstadt.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Torsten Zesch schrieb: > you could use TextCat > http://odur.let.rug.nl/~vannoord/TextCat/ This works quite well, but it is a bit slow. If you simply want to know whether a document is written in a given language or not, the laziest way is to use a spell checker and compute the percentage of "correctly spelled" words. Best, Niels -- Niels Ott Computational Linguist (B.A.) http://www.drni.de/niels/