uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Hernandez" <nicolas.hernan...@gmail.com>
Subject Processing collections as a set of documents
Date Thu, 31 Jan 2008 17:00:59 GMT

Making my first cpe, I am wondering how to deal with NLP tasks which
aims at processing several documents (i.e. pair of collection of
documents considered as a single entity) in a time. I am thinking
about applications such as (multilingual) text alignement, or term
extraction based on measures over a corpus, or text clustering (how to
compare one document with a set of documents)... Such applications
requires handling CAS over a kind of "collection artefact".

I saw it exists only the concepts of Annotation (inner document
description) and DocumentAnnotation.
I can imagine that some solutions can be possible thanks to CAS
Consumers or CAS Multipliers to deal with my problem but its only
hacking UIMA.

Does someone have got some experiences with such similar aims using
UIMA ? How do you handle them ? Does it exist something dedicated in
UIMA to work with a "collection artefact" ?



# Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
# Institut Universitaire de Technologie de Nantes - D├ępartement Informatique
tel. +33 (0)2 40 30 60 67

View raw message