uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: Should document level annotations inherit from DocumentAnnotation?
Date Thu, 15 Nov 2012 23:02:56 GMT

the DocumentAnnotation is special in the CAS. There should be only a single one per CAS and
if not added manually, it is automatically created when you call e.g. setDocumentLanguage().
It is possible use a custom type as DocumentAnnotation, e.g. if you want to add more metadata
to your CAS (e.g. source URL, etc.). In DKPro Core, we consequently use our DocumentMetaData
annotation which maintains fields like URL, Base URL, Document ID, Collection ID, etc..

-- Richard

Am 15.11.2012 um 19:27 schrieb Himanshu Gahlot <himanshu.gahlot86@gmail.com>

> I think I have found the answer in the AnnotationBase documentation:
> http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aas.annotationbase.
> It says that Annotation is just one sub-type of AnnotationBase which is
> suitable for textual annotations and users should inherit from
> AnnotationBase to create their own annotations which may not have 'begin'
> and 'end' features. So, I think if I need document level annotations which
> lack 'begin' and 'end' features then there is no restriction in inheriting
> from AnnotationBase.
> Himanshu
> On Wed, Nov 14, 2012 at 4:23 PM, Himanshu Gahlot <
> himanshu.gahlot86@gmail.com> wrote:
>> Hi,
>> I am a little confused about inheriting a document level annotation
>> from uima.jcas.tcas.DocumentAnnotation. I am of the view that a document
>> level annotation may not necessarily have 'begin' and 'end' features. For
>> example, I may want to have a document level annotation such as
>> DocumentCategory which has features such as category1, category2,
>> category3, score1, score2, score3, etc., where categories and scores are
>> the top 3 categories/scores for this document predicted by a document
>> classification algorithm. In such a case, it does not make sense to have
>> 'begin' and 'end' (and even 'coveredText') as features of DocumentCategory,
>> since, the categories do not exist in the document text itself and rather
>> just act like document metadata. Hence, I think it makes more sense to make
>> DocumentCategory inherit from AnnotationBase (which lacks 'begin' and 'end'
>> features) rather than from DocumentAnnotation. But I have not seen people
>> inheriting directly from AnnotationBase. Are there restrictions around not
>> inheriting from DocumentAnnotation or Annotation classes and directly
>> inheriting from AnnotationBase that I should be aware of? Does my
>> understanding of a document level annotation and its proposed lack of
>> 'begin' and 'end' features makes sense?
>> -Himanshu

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message