uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: merging CAS objects
Date Fri, 16 Sep 2011 09:14:53 GMT
On 09/16/2011 10:43 AM, Alexander Klenner wrote:
> I have a question concerning the merging of different UIMA pipelines.
> Say I have 3 different annotators that work on the same document (The
> CAS sofa data is identical for each of the pipelines) They do this
> parallel and all of them produce different annotations but in a sofa
> with the same name(_textView). Finally I have 3 serialized XCAS files
> in three different folders, coming from different nodes of a
> cluster.

We have the same problem sometimes, and I'd be very interested in a 
"clean" solution.

> Is there an UIMA conform way to merge the corresponding xml files
> into one CAS object that has all the annotations of the three
> separate files? I could easily do this with a non uima java class
> that just adds all the annotation information into one file. Since
> the sofa data is the same, the offset information of the annotations
> will be correct, but I'd rather stay in the UIMA context.

We actually edit XMI files using python scripts to add annotations that 
come from outside UIMA, etc.  However, especially given the very 
unfortunate disappearance of Ed Loper's uimapy, our approach is a bit 
hacky, e.g. for dealing with the xmi:id features, namespace prefixes for 
type systems, etc.  Also, XMI allows for many different representations 
of the same information, and our scripts really only deal with the most 
common version (as attributes).

I guess in Java you can at least use 
org.apache.uima.cas.impl.XmiCasDeserializer and 
org.apache.uima.cas.impl.XmiCasSerializer to avoid the XMI specific details.


View raw message