Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0EABA7740 for ; Mon, 24 Oct 2011 10:30:57 +0000 (UTC) Received: (qmail 38333 invoked by uid 500); 24 Oct 2011 10:30:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 38250 invoked by uid 500); 24 Oct 2011 10:30:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 38242 invoked by uid 99); 24 Oct 2011 10:30:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Oct 2011 10:30:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of laiqinyi@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Oct 2011 10:30:49 +0000 Received: by qyk29 with SMTP id 29so2077862qyk.14 for ; Mon, 24 Oct 2011 03:30:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=1b55giNnnA5mH+3sW46tJ4Ew0kGHB5UEgyD6sdTIyHw=; b=GsBxYVYnsx1uDd9YUBXm2ZAEjaIKszpg64NZ30ilviyUOlA6cL6YPzB6iRs4xMX+Lh 8weglhGZR9+oyH3nDqLW/XRpNTpq1BDvdZKsf6WQtS4aqncIGiFBk/B4LRRKVN/XjLQJ 85ftCvvuzPYgez9imr59wUJcTfpRYuQGkANZU= Received: by 10.182.17.67 with SMTP id m3mr3136869obd.18.1319452228182; Mon, 24 Oct 2011 03:30:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.47.101 with HTTP; Mon, 24 Oct 2011 03:29:48 -0700 (PDT) In-Reply-To: <22824730-4424-445D-B812-26BCD7B0D3B2@me.com> References: <22824730-4424-445D-B812-26BCD7B0D3B2@me.com> From: Mead Lai Date: Mon, 24 Oct 2011 18:29:48 +0800 Message-ID: Subject: Re: Language Identifier with Lucene? To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d04462bcc5be28304b008e988 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04462bcc5be28304b008e988 Content-Type: text/plain; charset=UTF-8 Luca, I would like to know: how much language, your system could identify? In my view, this difficult part in your system is: how to collect so many languages/character in the world for *one person*... Regards, Mead On Sun, Oct 23, 2011 at 1:27 AM, Petite Abeille wrote: > > On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote: > > > I usually use Nutch for this but, just for fun, I tried to create a > language > > identifier based on Lucene only. > > Talking of which: > > Google's Compact Language Detector > > http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compact.html > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --f46d04462bcc5be28304b008e988--