Return-Path: Delivered-To: apmail-lucene-nutch-dev-archive@www.apache.org Received: (qmail 9143 invoked from network); 15 Aug 2007 12:01:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Aug 2007 12:01:01 -0000 Received: (qmail 83028 invoked by uid 500); 15 Aug 2007 12:00:58 -0000 Delivered-To: apmail-lucene-nutch-dev-archive@lucene.apache.org Received: (qmail 82947 invoked by uid 500); 15 Aug 2007 12:00:57 -0000 Mailing-List: contact nutch-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@lucene.apache.org Delivered-To: mailing list nutch-dev@lucene.apache.org Received: (qmail 82935 invoked by uid 99); 15 Aug 2007 12:00:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2007 05:00:57 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.227.126.186] (HELO moutng.kundenserver.de) (212.227.126.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2007 12:00:55 +0000 Received: from [195.212.29.187] (helo=[9.152.14.86]) by mrelayeu.kundenserver.de (node=mrelayeu7) with ESMTP (Nemesis), id 0ML2xA-1ILHY93lOC-0001RM; Wed, 15 Aug 2007 14:00:34 +0200 Message-ID: <46C2EAD9.5010603@michael-baessler.de> Date: Wed, 15 Aug 2007 14:00:25 +0200 From: Michael Baessler User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: nutch-dev@lucene.apache.org Subject: Using Nutch LanguageIdentifierPlugin in Apache UIMA Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX1+OIurS8eYYINiRtgOaV/t/ts16DKGZMUEMqza r2KM2GaAUBOthM+/j9qUN/vqguTp5/EvB0O3aT2xmRpZ+XAYZB gTZu9Fadc/QsL7iAwwn5A== X-Virus-Checked: Checked by ClamAV on apache.org Hi, I'm one of the Apache UIMA committers and while searching for an open source language detection technology I found the Nutch LanguageIdentifierPlugin. First a short introduction what UIMA is: UIMA stands for Unstructured Information Management Architecture and is a component architecture and software framework implementation for the analysis of unstructured content like text, video and audio data. The framework has a pluggable architecture to build a chain of analysis engines to analyze the content. For further and more detailed information about UIMA, please refer to the Apache UIMA homepage: http://incubator.apache.org/uima/ We are interested in such a language identifier technology to wrap it as UIMA analysis engine, so that it can be used to build an analysis chain to analyze text content. We created an UIMA sandbox to host such analysis engines that everybody can use these engines he is interested in to build an analysis chain for his needs. Now my questions: Is there a place where I can find some more details about how your language identification works? Will it be possible to share the language identification technology so that we can wrap it as UIMA analysis engine? My current understanding is, that it is only available within Nutch but not separately. Since both projects are hosted on Apache, I don't see any license issues when using your technology. :-) Thanks for your answers in advance! -- Michael