uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From holmberg2...@comcast.net (Greg Holmberg)
Subject Re: AW: AW: Using UIMA for structured data sources
Date Tue, 05 Aug 2008 00:12:48 GMT
Gert--

UIMA does't store at all.  It's just an API you call--document in, annotations out.  That
is to say, Java objects.  What you do with those returned objects is your business.  There's
example code that can write the annotations to an XML file (one XML file for each input document).
 If you want to write the annotations to a database, a search engine, an RDF store, etc. you'll
have to write that code.  UIMA knows nothing about RDF or OWL.

Greg


 -------------- Original message ----------------------
From: "Villemos, Gert" <gert.villemos@logica.com>
> Luckily we have included some pretty tough semantic / linguistic experts in the 
> project.
>  
> Another question; 
> You mention that we need a UIMA-to-RDF converter. I had assumed that Apache UIMA 
> stored the data graph in RDF format... as this is apparently not the case; which 
> format is UIMA using?
>  
> Thanks,
> Gert.
> 
> ________________________________
> 
> Von: Greg Holmberg [mailto:holmberg2066@comcast.net]
> Gesendet: Di 05.08.2008 00:47
> An: uima-user@incubator.apache.org
> Cc: Villemos, Gert
> Betreff: Re: AW: Using UIMA for structured data sources
> 
> 
> 
> Gert--
> 
> 
> Ah, well, I don't know much about RDF, but you might want to take a look at some 
> of the projects IBM Research has done using UIMA, named entity extraction, and 
> OWL:
> 
>     http://researchweb.watson.ibm.com/UIMA/SUKI/index.html
> 
> Their Semantic Search engine is also interesting:
> 
>     
> http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.semanticSea
> rch.html
> 
> There are a lot of pieces you'll need to acquire to make this work: crawlers, 
> adapters, file format filters, an entity and relationship extractor, UIMA-to-RDF 
> converter, etc.  There are many choices both commercial and open source for each 
> of these pieces.
> 
> Except that last one, which I think is a pretty hard problem.  You'll probably 
> also have to hire some computational linguists for the natural languages you 
> want to support, since reliably extracting facts from human-generated text is 
> extremely difficult (if not impossible).  I'd say that the system you describe 
> is probably at or even beyond what researchers are attempting today.  And I'm 
> not aware of any commecial software that actually tries to reason on facts 
> extracted from natural language.
> 
> UIMA can help you process those CLOB and VARCHAR fields from your database, but 
> probably isn't a good match for processing INTEGER, DOUBLE, TIMESTAMP, etc.
> 
> 
> Greg Holmberg
> 
> 
>  -------------- Original message ----------------------
> From: "Villemos, Gert" <gert.villemos@logica.com>
> > Thanks for your answer. Indeed I need to read the UIMA documentation better.
> > 
> > We are building a system that will support Busines Intelligence applications
> > based on a data warehouse, as well as knowledge management features based on a
> > knowledge base. We are looking at UIMA for the loading into the knowledge 
> base.
> > 
> > We have multiple data sources, some are completly structured. Others are
> > semi-structured (well defined fields, but main input is free text fields).  
> And
> > other again are completly unstructured (presentations, concept papers, etc).
> > 
> > The data warehouse we will use for report generation, trending and data 
> mining.
> > 
> > On the knowledge base we would like to perform simple keyword search and 
> indeed
> > Lucene is a candidate (Solr is a better candidate as it among others support
> > substitution) but we would also like to perform based reasoning, as well as
> > ontology based reasoning / derivation of knowledge. And we are therefore 
> looking
> > at a knowledge base containing a RDF data graph, not just a flat index.
> > 
> > As far as I have been able to gather from the internet there has been some of
> > discussion on integrating Apache UIMA and Lucene, but no integration has
> > actually been made.
> > 
> > A better way of asking the question is therefore; for our knowledge base, what
> > do we use to create the RDF data graph? Should we:
> > 
> > 1. Split this into two separate tool chains, one for structured data and one 
> for
> > unstructured data (based on UIMA)?
> > 2. Use UIMA for structured as well as unstructured?
> > 
> > Gert.
> > 
> > 
> >
> > ________________________________
> >
> > Von: Greg Holmberg [mailto:holmberg2066@comcast.net]
> > Gesendet: Mo 04.08.2008 23:39
> > An: uima-user@incubator.apache.org
> > Cc: Villemos, Gert
> > Betreff: Re: Using UIMA for structured data sources
> >
> >
> >
> > Gert--
> >
> >
> > I'm not sure I understand what you're trying to build.  Your description is a
> > little vague.  Perhaps you could provide some use-cases?
> >
> > I recommend that you read the UIMA docs and then ask any questions you still
> > have.
> >
> > Be aware the UIMA is not a search engine.  If all you want to do is index some
> > documents, then maybe all you need is Apache Lucene.  For the structured side,
> > maybe you need a data warehouse.  Or maybe you just want to index some of the
> > CLOBs and VARCHARS into a search engine.  It's hard to tell from your
> > description.
> >
> >
> > Greg Holmberg
> >
> >  -------------- Original message ----------------------
> > From: "Villemos, Gert" <gert.villemos@logica.com>
> > > We have a number of data sources, some of which are fully structured,
> > > other which are informal (unstructured). We would like to create a
> > > central search facility covering structured as well as unstructured
> > > data.
> > > UIMA seems to fit the bill, but is focused on unstructured data.
> > > Can/should I use it to also integrate structured data?
> > >
> > > If yes, what are the modules which I must develop for the framework?
> > >
> > > If no, what tools should I use in combination with UIMA to integrate
> > > unstructured data?
> > >
> > > Thanks,
> > > Gert.
> > >
> > >
> > > This e-mail and any attachment is for authorised use by the intended
> > > recipient(s) only. It may contain proprietary material, confidential
> > information
> > > and/or be subject to legal privilege. It should not be copied, disclosed to,
> > > retained or used by, any other party. If you are not an intended recipient
> > then
> > > please promptly delete this e-mail and any attachment and all copies and
> > inform
> > > the sender. Thank you.
> > >
> > >
> >
> >
> >
> >
> >
> >
> > This e-mail and any attachment is for authorised use by the intended
> > recipient(s) only. It may contain proprietary material, confidential 
> information
> > and/or be subject to legal privilege. It should not be copied, disclosed to,
> > retained or used by, any other party. If you are not an intended recipient 
> then
> > please promptly delete this e-mail and any attachment and all copies and 
> inform
> > the sender. Thank you.
> >
> >
> 
> 
> 
> 
> 
> 
> This e-mail and any attachment is for authorised use by the intended 
> recipient(s) only. It may contain proprietary material, confidential information 
> and/or be subject to legal privilege. It should not be copied, disclosed to, 
> retained or used by, any other party. If you are not an intended recipient then 
> please promptly delete this e-mail and any attachment and all copies and inform 
> the sender. Thank you.
> 
> 


Mime
View raw message