Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 25488 invoked from network); 4 Aug 2008 23:00:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Aug 2008 23:00:28 -0000 Received: (qmail 55878 invoked by uid 500); 4 Aug 2008 23:00:27 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 55855 invoked by uid 500); 4 Aug 2008 23:00:27 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 55844 invoked by uid 99); 4 Aug 2008 23:00:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Aug 2008 16:00:27 -0700 X-ASF-Spam-Status: No, hits=-2.0 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gert.villemos@logica.com designates 216.32.180.16 as permitted sender) Received: from [216.32.180.16] (HELO VA3EHSOBE006.bigfish.com) (216.32.180.16) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Aug 2008 22:59:31 +0000 Received: from mail39-va3-R.bigfish.com (10.7.14.252) by VA3EHSOBE006.bigfish.com (10.7.40.26) with Microsoft SMTP Server id 8.1.240.5; Mon, 4 Aug 2008 22:59:36 +0000 Received: from mail39-va3 (localhost.localdomain [127.0.0.1]) by mail39-va3-R.bigfish.com (Postfix) with ESMTP id D1F411558602; Mon, 4 Aug 2008 22:59:35 +0000 (UTC) X-BigFish: VPS-53(zzaf6W1432R14e0Q936eQ4015M1805M14e1Jzzzzz2fh6bh43j61h) X-Spam-TCS-SCL: 0:0 X-FB-SS: 5,5, Received: by mail39-va3 (MessageSwitch) id 1217890772553329_13237; Mon, 4 Aug 2008 22:59:32 +0000 (UCT) Received: from c2-ex001.groupinfra.com (unknown [212.123.206.140]) by mail39-va3.bigfish.com (Postfix) with ESMTP id 12EE8318006; Mon, 4 Aug 2008 22:59:31 +0000 (UTC) Received: from de-ex001.groupinfra.com ([10.48.190.30]) by c2-ex001.groupinfra.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 5 Aug 2008 00:59:31 +0200 Received: from de-ex014.groupinfra.com ([10.48.190.44]) by de-ex001.groupinfra.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 5 Aug 2008 00:59:30 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-Class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C8F685.C0E79524" Subject: AW: AW: Using UIMA for structured data sources Date: Tue, 5 Aug 2008 00:54:52 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: AW: Using UIMA for structured data sources Thread-Index: Acj2hB+CVXwGvL7fTYKQP0vqTVvRNAAAPu32 References: <080420082247.14351.4897870F000607C50000380F2205886172C0C0CFCD099D0A0D03040108@comcast.net> From: "Villemos, Gert" To: "Greg Holmberg" , X-OriginalArrivalTime: 04 Aug 2008 22:59:30.0658 (UTC) FILETIME=[C0D8DC20:01C8F685] X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C8F685.C0E79524 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Luckily we have included some pretty tough semantic / linguistic experts in= the project. = Another question; = You mention that we need a UIMA-to-RDF converter. I had assumed that Apache= UIMA stored the data graph in RDF format... as this is apparently not the = case; which format is UIMA using? = Thanks, Gert. ________________________________ Von: Greg Holmberg [mailto:holmberg2066@comcast.net] Gesendet: Di 05.08.2008 00:47 An: uima-user@incubator.apache.org Cc: Villemos, Gert Betreff: Re: AW: Using UIMA for structured data sources Gert-- Ah, well, I don't know much about RDF, but you might want to take a look at= some of the projects IBM Research has done using UIMA, named entity extrac= tion, and OWL: http://researchweb.watson.ibm.com/UIMA/SUKI/index.html Their Semantic Search engine is also interesting: http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.se= manticSearch.html There are a lot of pieces you'll need to acquire to make this work: crawler= s, adapters, file format filters, an entity and relationship extractor, UIM= A-to-RDF converter, etc. There are many choices both commercial and open s= ource for each of these pieces. Except that last one, which I think is a pretty hard problem. You'll proba= bly also have to hire some computational linguists for the natural language= s you want to support, since reliably extracting facts from human-generated= text is extremely difficult (if not impossible). I'd say that the system = you describe is probably at or even beyond what researchers are attempting = today. And I'm not aware of any commecial software that actually tries to = reason on facts extracted from natural language. UIMA can help you process those CLOB and VARCHAR fields from your database,= but probably isn't a good match for processing INTEGER, DOUBLE, TIMESTAMP,= etc. Greg Holmberg -------------- Original message ---------------------- From: "Villemos, Gert" > Thanks for your answer. Indeed I need to read the UIMA documentation bett= er. > = > We are building a system that will support Busines Intelligence applicati= ons > based on a data warehouse, as well as knowledge management features based= on a > knowledge base. We are looking at UIMA for the loading into the knowledge= base. > = > We have multiple data sources, some are completly structured. Others are > semi-structured (well defined fields, but main input is free text fields)= . And > other again are completly unstructured (presentations, concept papers, et= c). > = > The data warehouse we will use for report generation, trending and data m= ining. > = > On the knowledge base we would like to perform simple keyword search and = indeed > Lucene is a candidate (Solr is a better candidate as it among others supp= ort > substitution) but we would also like to perform based reasoning, as well = as > ontology based reasoning / derivation of knowledge. And we are therefore = looking > at a knowledge base containing a RDF data graph, not just a flat index. > = > As far as I have been able to gather from the internet there has been som= e of > discussion on integrating Apache UIMA and Lucene, but no integration has > actually been made. > = > A better way of asking the question is therefore; for our knowledge base,= what > do we use to create the RDF data graph? Should we: > = > 1. Split this into two separate tool chains, one for structured data and = one for > unstructured data (based on UIMA)? > 2. Use UIMA for structured as well as unstructured? > > Gert. > = > = > > ________________________________ > > Von: Greg Holmberg [mailto:holmberg2066@comcast.net] > Gesendet: Mo 04.08.2008 23:39 > An: uima-user@incubator.apache.org > Cc: Villemos, Gert > Betreff: Re: Using UIMA for structured data sources > > > > Gert-- > > > I'm not sure I understand what you're trying to build. Your description = is a > little vague. Perhaps you could provide some use-cases? > > I recommend that you read the UIMA docs and then ask any questions you st= ill > have. > > Be aware the UIMA is not a search engine. If all you want to do is index= some > documents, then maybe all you need is Apache Lucene. For the structured = side, > maybe you need a data warehouse. Or maybe you just want to index some of= the > CLOBs and VARCHARS into a search engine. It's hard to tell from your > description. > > > Greg Holmberg > > -------------- Original message ---------------------- > From: "Villemos, Gert" > > We have a number of data sources, some of which are fully structured, > > other which are informal (unstructured). We would like to create a > > central search facility covering structured as well as unstructured > > data. > > UIMA seems to fit the bill, but is focused on unstructured data. > > Can/should I use it to also integrate structured data? > > > > If yes, what are the modules which I must develop for the framework? > > > > If no, what tools should I use in combination with UIMA to integrate > > unstructured data? > > > > Thanks, > > Gert. > > > > > > This e-mail and any attachment is for authorised use by the intended > > recipient(s) only. It may contain proprietary material, confidential > information > > and/or be subject to legal privilege. It should not be copied, disclose= d to, > > retained or used by, any other party. If you are not an intended recipi= ent > then > > please promptly delete this e-mail and any attachment and all copies an= d > inform > > the sender. Thank you. > > > > > > > > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential info= rmation > and/or be subject to legal privilege. It should not be copied, disclosed = to, > retained or used by, any other party. If you are not an intended recipien= t then > please promptly delete this e-mail and any attachment and all copies and = inform > the sender. Thank you. > > This e-mail and any attachment is for authorised use by the intended recipi= ent(s) only. It may contain proprietary material, confidential information = and/or be subject to legal privilege. It should not be copied, disclosed to= , retained or used by, any other party. If you are not an intended recipien= t then please promptly delete this e-mail and any attachment and all copies= and inform the sender. Thank you. ------_=_NextPart_001_01C8F685.C0E79524--