From java-user-return-27873-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri May 04 00:20:47 2007 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 40429 invoked from network); 4 May 2007 00:20:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 May 2007 00:20:46 -0000 Received: (qmail 35869 invoked by uid 500); 4 May 2007 00:20:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 35833 invoked by uid 500); 4 May 2007 00:20:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 35822 invoked by uid 99); 4 May 2007 00:20:45 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2007 17:20:45 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of chris.lu@gmail.com designates 66.249.92.172 as permitted sender) Received: from [66.249.92.172] (HELO ug-out-1314.google.com) (66.249.92.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 May 2007 17:20:38 -0700 Received: by ug-out-1314.google.com with SMTP id k40so484026ugc for ; Thu, 03 May 2007 17:20:17 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=eAKxxv0C6Ss7M7OGWYCNrsEv3Ds9STsqzPFJeqOV0Cxza0xUINdvVj6yN/jNRdmbWifIQc6Y0ocWM2jzbYfVztTWtBL98yICq5CRTa14STxmS9DepPrUK36zydyfiQGrh0Zl6+nM9JmIJpUDVsM8E0bWqHPbIvQlwNnh5RlzQgY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=WH1hPh/SJ+mOLRH8KlDlLoQefyQrvCINUnH4eraYsefpl2u+EHzvjHo3CvEGiR/OOIcwG+F0o5tqEjTyhgw063erQUX64pRROqrh6ti4vZbA2tCbdPLz9zpIkuh1TkmrEujBkUalh2Lhh9v52MHltQ44c0mI+Jz4w1YItxh5kkw= Received: by 10.67.29.12 with SMTP id g12mr2293248ugj.1178238016949; Thu, 03 May 2007 17:20:16 -0700 (PDT) Received: by 10.67.64.16 with HTTP; Thu, 3 May 2007 17:20:16 -0700 (PDT) Message-ID: <6e3ae6310705031720x1eefe928sec4e4e0821838c21@mail.gmail.com> Date: Thu, 3 May 2007 17:20:16 -0700 From: "Chris Lu" To: java-user@lucene.apache.org Subject: Re: Language detection library In-Reply-To: <7175B465-0F0B-4E32-AE7C-B7749B5B7F2C@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3563fa0d0705031255m673f3fb8l60aba4ae02f61deb@mail.gmail.com> <7175B465-0F0B-4E32-AE7C-B7749B5B7F2C@gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I suppose if a document is indexed as English or French, when users searching the document, we need to parse the query as English or French also? -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes On 5/3/07, karl wettin wrote: > > 3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK): > > > Anyone knows of a good language detection library that can detect what > > language a document (text) is ? > > I posted this some time back: > > https://issues.apache.org/jira/browse/LUCENE-826 > > A bit of proof-of-concept:ish, but it does the job well if you ask > me. Uses Weka (GPL) and requires at least 150 characters to be trusted. > > > -- > karl > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org