incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <m...@leosimons.com>
Subject Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA
Date Fri, 25 Aug 2006 11:07:59 GMT
Hi Marshall!

I'm sure all this is potentially interesting, but you're going to have
to help us understand why.

On Wed, Aug 23, 2006 at 03:21:55PM -0400, Marshall Schor wrote:
> Proposal for Incubation Project: Unstructured Information Management 
> Architecture - UIMA
> 
> The Unstructured Information Management Architecture (UIMA) is an 
> architecture and software framework for creating, discovering, composing 
> and deploying a broad range of multi-modal analysis capabilities.  We 
> propose a project to develop, implement, support and enhance UIMA 
> framework implementations that comply with the UIMA standard (being put 
> forward concurrently for standardization within OASIS 
> http://www.oasis-open.org - not yet submitted, but we plan to do this 
> early in September.). 
<snip/>
> Motivation for UIMA: Databases are core components of nearly all 
> applications; they store information in structured tables.  But more and 
> more of the available digital data is unstructured (e.g. email, web 
> documents, images, audio clips, video streams) with little information 
> (metadata) attached to explain its content or context.  Although many 
> applications have been built to process unstructured data, they have 
> either managed it as a BLOB or they have developed isolated applications 
> for analyzing the content.  In the absence of a standardized means for 
> analytical applications to share insights extracted from the content, 
> analytical applications cannot build upon one another. As a result, the 
> industry has barely begun to tap the value locked in unstructured 
> information.
<snip/>

What does it *do*? How does it *work*? I understand there's a runtime and
a framework and a standardization process and a component-based
interoperability goal, but what I don't understand is what they are *for*.

Can you please write a paragraph or two, that

1) doesn't mention "what the industry is doing" or needs to do
2) doesn't mention frameworks, standards, or current problematic
   industry practices, SOA, SOAP, DARPA, OASIS, or other acronyms
3) outlines what problem this UIMA thing is meant to solve
4) outlines what the approach is to solving that problem
5) outlines how this turns into software
6) gives an example or even two of such software in use in the real world to
   solve some kind of tangible problem

For example, one kind of "unstructured information" is "the web", and one
way to process that is "as plain text, indexing it, create a keyword-based search
engine", and then there's also fancier ways such as all the things that google
does. And then there's also various ways to make the unstructured mess that is the
web more structured by attaching metadata, eg dublin core metadata or the whwole
semantic web thing, so right now I might walk away with the understanding that
you're devising a way for google and yahoo to interop (which I doubt they really
want) by re-inventing the semantic web movement (which I doubt is really
productive). Enlighten me, please. If it helps, imagine I'm 12 and write PHP and 
have difficulty with words such as interoperability since English is not my first
language.

cheers,

LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message