incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: URIs, Semantic Web, UIMA and Clerezza
Date Wed, 01 Jun 2011 13:09:59 GMT
Hello Paolo,
I'm really to see you here on clerezza-dev@

2011/6/1 Paolo Ciccarese <paolo.ciccarese@gmail.com>

> My name is Paolo Ciccarese and I work at Massachusetts General Hospital and
> Harvard Medical School where I develop ontologies and knowledge management
> tools for biomedical scientists, publishers and pharmaceutical companies.
>
> In the last year, I have dedicated a great deal of time on DOMEO [1], a web
> application for annotating online resources (text, images and data). DOMEO
> was developed in parallel with a model for exchanging annotation in RDF
> called Annotation Ontology (AO) [2].
>
> One of the features of the tool is the support for automatic or
> semi-automatic annotation of text documents by leveraging text mining
> services. We therefore planned to integrate UIMA with our environment by
> writing code for translating the UIMA results into AO format. The interest
> for Clerezza came naturally.
>

:-)


>
> Thanks to a little introduction by Tommaso Teofili, in a few hours I was
> almost able to complete my own extensions for Clerezza for performing the
> AO
> serialization. The problem arose when I tried to bring URIs into the
> picture. Imagine an entity recognition service returning, for instance, the
> DBPedia URIs of cities extracted from a document. In our field, we have
> entity recognition services returning entities such as proteins, biological
> processes and genes.
>
> The way UIMA types work, I can create a field storing the URI and pass it
> along. That would be sufficient, however, it is going to be my personal way
> of dealing with URIs. As my goal is to integrate several entity recognition
> services through the UIMA platform, it would be more beneficial to
> establish
> a convention. For instance, we can ask all the developers that want to
> integrate with Linked Open Data to extend an abstract class that encodes
> the
> recommended way to take care of URIs (the URI of the entity and the related
> URIs). If you are not dealing with URI you can ignore the convention and
> keep doing exactly what you were doing before.
>

In my opinion this could come as a very simple and helpful addition which
will allow much more Open Linked Data sources to be mined on top of Clerezza
with UIMA engines. In particular this leverage the disambiguation of, for
example, extracted named entities allowing persistent reference to URIs.
As you can see from UIMAUtils.java in uima.utils at line 100 I also noticed
this problem where services like OpenCalais offer not only named entity type
and selected text but also a URI; so I think the UIMA integration should be
offer a way of dealing with entities' URIs eventually creating the needed
nodes in graph.


>
> For Semantic Web applications leveraging UIMA, not only in bio-medicine,
> this would be a big step forward.
>
> I am really interested in collecting your feedback on this topic.
>

>From an implementation point of view I am thinking to two different ways of
doing what you propose, I'll post those options in a short time and I'd be
happy to have your preferences on how to do what Paolo proposes.
Thank you very much Paolo.
All the best,
Tommaso


>
> Best,
> Paolo
>
>
> [1] DOMEO http://vimeo.com/paolociccarese/videos
> [2] AO http://code.google.com/p/annotation-ontology/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message