uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: AW: Using UIMA for structured data sources
Date Mon, 11 Aug 2008 13:30:14 GMT
We usually think of UIMA applications as apps that extract
structured data from unstructured data.  If you look at it
at this level of abstraction, you would be better off dealing
with your structured data outside of UIMA.

There can still be cases where a running your structured
through some UIMA app may be useful, for example for name
normalization or such.

So, there's no answer that's always right.  If your structured
data is clean enough, I wouldn't send it through UIMA.  If not,
there may be some benefit to using a unified process for both
kinds of data sources.

--Thilo

Villemos, Gert wrote:
> Thanks for your answer. Indeed I need to read the UIMA documentation better.
>  
> We are building a system that will support Busines Intelligence applications based on
a data warehouse, as well as knowledge management features based on a knowledge base. We are
looking at UIMA for the loading into the knowledge base.
>  
> We have multiple data sources, some are completly structured. Others are semi-structured
(well defined fields, but main input is free text fields).  And other again are completly
unstructured (presentations, concept papers, etc).
>  
> The data warehouse we will use for report generation, trending and data mining.
>  
> On the knowledge base we would like to perform simple keyword search and indeed Lucene
is a candidate (Solr is a better candidate as it among others support substitution) but we
would also like to perform based reasoning, as well as ontology based reasoning / derivation
of knowledge. And we are therefore looking at a knowledge base containing a RDF data graph,
not just a flat index.
>  
> As far as I have been able to gather from the internet there has been some of discussion
on integrating Apache UIMA and Lucene, but no integration has actually been made.
>  
> A better way of asking the question is therefore; for our knowledge base, what do we
use to create the RDF data graph? Should we:
>  
> 1. Split this into two separate tool chains, one for structured data and one for unstructured
data (based on UIMA)?
> 2. Use UIMA for structured as well as unstructured?
>  
> Gert.
>  
>  
> 
> ________________________________
> 
> Von: Greg Holmberg [mailto:holmberg2066@comcast.net]
> Gesendet: Mo 04.08.2008 23:39
> An: uima-user@incubator.apache.org
> Cc: Villemos, Gert
> Betreff: Re: Using UIMA for structured data sources
> 
> 
> 
> Gert--
> 
> 
> I'm not sure I understand what you're trying to build.  Your description is a little
vague.  Perhaps you could provide some use-cases?
> 
> I recommend that you read the UIMA docs and then ask any questions you still have.
> 
> Be aware the UIMA is not a search engine.  If all you want to do is index some documents,
then maybe all you need is Apache Lucene.  For the structured side, maybe you need a data
warehouse.  Or maybe you just want to index some of the CLOBs and VARCHARS into a search engine.
 It's hard to tell from your description.
> 
> 
> Greg Holmberg
> 
>  -------------- Original message ----------------------
> From: "Villemos, Gert" <gert.villemos@logica.com>
>> We have a number of data sources, some of which are fully structured,
>> other which are informal (unstructured). We would like to create a
>> central search facility covering structured as well as unstructured
>> data.
>> UIMA seems to fit the bill, but is focused on unstructured data.
>> Can/should I use it to also integrate structured data?
>>
>> If yes, what are the modules which I must develop for the framework?
>>
>> If no, what tools should I use in combination with UIMA to integrate
>> unstructured data?
>>
>> Thanks,
>> Gert.
>>
>>
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. It may contain proprietary material, confidential information
>> and/or be subject to legal privilege. It should not be copied, disclosed to,
>> retained or used by, any other party. If you are not an intended recipient then
>> please promptly delete this e-mail and any attachment and all copies and inform
>> the sender. Thank you.
>>
>>
> 
> 
> 
> 
> 
> 
> This e-mail and any attachment is for authorised use by the intended recipient(s) only.
It may contain proprietary material, confidential information and/or be subject to legal privilege.
It should not be copied, disclosed to, retained or used by, any other party. If you are not
an intended recipient then please promptly delete this e-mail and any attachment and all copies
and inform the sender. Thank you.
> 
> 

Mime
View raw message