incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Ciccarese <>
Subject URIs, Semantic Web, UIMA and Clerezza
Date Tue, 31 May 2011 22:24:19 GMT
My name is Paolo Ciccarese and I work at Massachusetts General Hospital and
Harvard Medical School where I develop ontologies and knowledge management
tools for biomedical scientists, publishers and pharmaceutical companies.

In the last year, I have dedicated a great deal of time on DOMEO [1], a web
application for annotating online resources (text, images and data). DOMEO
was developed in parallel with a model for exchanging annotation in RDF
called Annotation Ontology (AO) [2].

One of the features of the tool is the support for automatic or
semi-automatic annotation of text documents by leveraging text mining
services. We therefore planned to integrate UIMA with our environment by
writing code for translating the UIMA results into AO format. The interest
for Clerezza came naturally.

Thanks to a little introduction by Tommaso Teofili, in a few hours I was
almost able to complete my own extensions for Clerezza for performing the AO
serialization. The problem arose when I tried to bring URIs into the
picture. Imagine an entity recognition service returning, for instance, the
DBPedia URIs of cities extracted from a document. In our field, we have
entity recognition services returning entities such as proteins, biological
processes and genes.

The way UIMA types work, I can create a field storing the URI and pass it
along. That would be sufficient, however, it is going to be my personal way
of dealing with URIs. As my goal is to integrate several entity recognition
services through the UIMA platform, it would be more beneficial to establish
a convention. For instance, we can ask all the developers that want to
integrate with Linked Open Data to extend an abstract class that encodes the
recommended way to take care of URIs (the URI of the entity and the related
URIs). If you are not dealing with URI you can ignore the convention and
keep doing exactly what you were doing before.

For Semantic Web applications leveraging UIMA, not only in bio-medicine,
this would be a big step forward.

I am really interested in collecting your feedback on this topic.


[2] AO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message