uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Entities
Date Thu, 05 Mar 2009 12:44:04 GMT
Hi Frank,
I defined my Entity type as subclass of uima.cas.TOP (the root of type
system) because the Annotation features are not required; it has an
attribute to identify the Entity and an occurencies FSArray of Annotations
that refer text in the document that point to the Entity.
For example this is useful for me because I have to feed the CAS Consumer
with only one entry for each Entity, it would be wasteful to feed it with
all the Annotations, all identifying the same concept.
I read the docs you linked and I think also the capability of establishing
relations between entities (or elements) would be a great plus for the UIMA
core, not only for a commercial project.
Bye,
Tommaso



2009/3/4 Frank Schilder <frank.schilder@thomsonreuters.com>

> Hi Tommaso,
>
> We defined such a type called Element, but it is not a sub-type of
> Annotation, because it doesn't contain any  begin and end offset
> information. Moreover, it contains an attribute that refers back to  a list
> of annotations where this element is mentioned in the text.
>
> Terry Heinze, Marc Light and Frank Schilder (2008). Experiences with UIMA
> for online information extraction at Thomson Corporation. In Proceedings or
> the LREC workshop "Towards Enhanced Interoperability for large HLT systems:
> UIMA for NLP, Marrakesh, Morocco.
>
> The paper can be found in the proceedings:
> http://www.lrec-conf.org/proceedings/lrec2008/workshops/W16_Proceedings.pdf
>
> There are also slides available (go to slide 11 and 12):
>
> http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/slides/marc_lig
> ht.pdf<http://watchtower.coling.uni-jena.de/%7Ecoling/uimaws_lrec2008/slides/marc_lig%0Aht.pdf>
>
> Thanks,
> Frank
>
>
>
> > From: Tommaso Teofili <tommaso.teofili@gmail.com>
> > Reply-To: <uima-user@incubator.apache.org>
> > Date: Wed, 4 Mar 2009 12:21:45 +0100
> > To: <uima-user@incubator.apache.org>
> > Subject: Re: Entities
> >
> > Ok, thanks. Done, it works now.
> > I think this could be an interesting predefined feature, as this usage is
> > mentioned in the documentation too.
> > What do you think about it?
> >
> >
> > 2009/3/3 Marshall Schor <msa@schor.com>
> >
> >> There is no predefined Entity type in base UIMA; you will need to define
> >> your own "entity" type.  Suppose it is called "EntityInstance", is a
> >> subtype of Annotation, and includes a field called "id", which is some
> >> unique ID for this entity (perhaps a String type).  Then, you can have
> >> an annotator that runs at the end of your pipeline of annotators which
> >> detects instances of entities (I'm assuming you have multiple annotators
> >> that do this, of course).  This last annotator could get an iteration
> >> index over all things of the "EntityInstance" type, and use a standard
> >> Java hashmap to associate entity unique IDs with Java ArrayLists of
> >> their "instances".  Then, you could make one new Feature Structure, say
> >> of type "Entity", which could have features "uniqueID" and "instances",
> >> and set the "instances" to a FeatureStructure Array of EntityInstances.
> >>
> >> HTH. -Marshall
> >>
> >> Tommaso Teofili wrote:
> >>> Hello everybody,
> >>> I am annotating a document text and I have now a lot of annotations.
> >>> Many of that annotations refer to the same "entity", as described in
> the
> >>> UIMA Overview & SDK Setup (
> >>>
> >>
> http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/
> >>
> html/overview_and_setup/overview_and_setup.html#ugr.ovv.conceptual.metadata_i
> >> n_cas
> >> ).
> >>> I expected to have a predefined Entity type in UIMA but i cannot find
> it;
> >>> moreover also defining it by myself I can't find an appropriate range
> >> type
> >>> for the "occurencies" feature to store the annotations related to that
> >>> entity, as stated in the tutorial.
> >>> Any suggestions?
> >>> Thanks in advance,
> >>> Tommaso
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message