uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eddie Epstein" <eaepst...@gmail.com>
Subject Re: annotation comparator
Date Thu, 04 Sep 2008 20:26:01 GMT
The problem with generic CAS comparison is the potential complexity of the
object model represented in a CAS. Instead of a single general purpose
method, another approach is application (or object model) specific
formatting code that would create output specifically designed for

If the object model to be compared is limited to annotations, just dumping
all annotations, each as a single line without covered text, in index order
would be useful as input to a standard diff program. Further sorting by
annotation type before the diff might help make the differences more
understandable in some situations.

If you are interested, there are some annotation pretty print options in
UIMA that could help here.


On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl <pkluegl@uni-wuerzburg.de>wrote:

> Hi,
> what is the status quo for the comparison of two CAS right now? Is there
> yet any usable solution (with or without documentation)?
> I am developing a rule-based system (with scripting functionalities)
> especially for complex information and text extraction tasks. The IDE is
> DLTK-based and UIMA descriptors (for a generic implementation) are generated
> automatically. Currently i am improving a information extraction application
> with a test-driven approach. The test cases are, of course, CAS XMI files
> and the comparison (of two CAS) is working, but yet unsatisfying. I am
> especially interested in annotations for the false positives and false
> negatives (overlapping or not overlapping).
> Back to my question:
> How do you all compare two CAS?
> Is there a reusable implementation?
> Peter
> Katrin Tomanek schrieb:
>  Hi,
>>  Depends what your favorite tooling story is.  If you prefer
>>> the eclipse tooling, it should go into eclipse.  I know
>>> people who would use this kind of functionality if it was
>>> in CVD :-)
>>>> And shouldn't the differences be kept as new annotation types so the
>>>> viewers don't need to be changed?
>>> Somehow I don't see that.  The tooling could be made a lot
>>> nicer if it knows it's displaying differences.  And I wouldn't
>>> want to add annotations to my data just for display purposes.
>>> Or maybe I misunderstood?
>> Mh, not sure. This is probably data that is only used in evaluation
>> scenarios, so I don't see a big problem with it.
>> Well, in our first version we now just add new types.  Works OK for us so
>> far. However, its really just a first version...
>> Katrin
> --
> Peter Klügl
> University of Würzburg
> pkluegl@uni-wuerzburg.de

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message