nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Baessler <>
Subject Using Nutch LanguageIdentifierPlugin in Apache UIMA
Date Wed, 15 Aug 2007 12:00:25 GMT

I'm one of the Apache UIMA committers and while searching for an open 
source language detection technology I found the
Nutch LanguageIdentifierPlugin.

First a short introduction what UIMA is:
UIMA stands for Unstructured Information Management Architecture and is 
a component architecture and software framework implementation
for the analysis of unstructured content like text, video and audio 
data. The framework has a pluggable architecture to build a chain of
analysis engines to analyze the content. For further and more detailed 
information about UIMA, please refer to the Apache UIMA homepage:

We are interested in such a language identifier technology to wrap it as 
UIMA analysis engine, so that it can be used to build an analysis chain 
to analyze text content.
We created an UIMA sandbox to host such analysis engines that everybody 
can use these engines he is interested in to build an analysis chain for 
his needs.

Now my questions:
Is there a place where I can find some more details about how your 
language identification works?
Will it be possible to share the language identification technology so 
that we can wrap it as UIMA analysis engine? My current understanding 
is, that it is only available within Nutch but not separately.

Since both projects are hosted on Apache, I don't see any license issues 
when using your technology. :-)

Thanks for your answers in advance!

-- Michael

View raw message