uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: annotation comparator
Date Fri, 05 Sep 2008 15:50:58 GMT

thanks for your answers.

A default text-based diff of pretty printed annotations may not be a 
solution for my specific requirements, but is a nice alternative for 
manual testing (I am already using the pretty print methods for that). I 
think i will keep my simple solution as a start, which is working 
similar to your proposed one, but directly compares the features of the 
annotations in java.

I was wondering if the CFE project was supporting some sort of 
comparison or testing since the paper has "testing" in its title, but I 
haven't found any suitable fragments in the source code.

On the long run, a good and reusable solution for the comparison and 
automatic back-testing of annotation and/or FS can become a interesting 
component. Maybe there is a possibility to combine some efforts? 
(pointing amongst others to Katrin)

have a nice weekend,


Eddie Epstein schrieb:
> The problem with generic CAS comparison is the potential complexity of the
> object model represented in a CAS. Instead of a single general purpose
> method, another approach is application (or object model) specific
> formatting code that would create output specifically designed for
> comparison.
> If the object model to be compared is limited to annotations, just dumping
> all annotations, each as a single line without covered text, in index order
> would be useful as input to a standard diff program. Further sorting by
> annotation type before the diff might help make the differences more
> understandable in some situations.
> If you are interested, there are some annotation pretty print options in
> UIMA that could help here.
> Eddie
> On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl <pkluegl@uni-wuerzburg.de>wrote:
>> Hi,
>> what is the status quo for the comparison of two CAS right now? Is there
>> yet any usable solution (with or without documentation)?
>> I am developing a rule-based system (with scripting functionalities)
>> especially for complex information and text extraction tasks. The IDE is
>> DLTK-based and UIMA descriptors (for a generic implementation) are generated
>> automatically. Currently i am improving a information extraction application
>> with a test-driven approach. The test cases are, of course, CAS XMI files
>> and the comparison (of two CAS) is working, but yet unsatisfying. I am
>> especially interested in annotations for the false positives and false
>> negatives (overlapping or not overlapping).
>> Back to my question:
>> How do you all compare two CAS?
>> Is there a reusable implementation?
>> Peter
>> Katrin Tomanek schrieb:
>>  Hi,
>>>  Depends what your favorite tooling story is.  If you prefer
>>>> the eclipse tooling, it should go into eclipse.  I know
>>>> people who would use this kind of functionality if it was
>>>> in CVD :-)
>>>>> And shouldn't the differences be kept as new annotation types so the
>>>>> viewers don't need to be changed?
>>>> Somehow I don't see that.  The tooling could be made a lot
>>>> nicer if it knows it's displaying differences.  And I wouldn't
>>>> want to add annotations to my data just for display purposes.
>>>> Or maybe I misunderstood?
>>> Mh, not sure. This is probably data that is only used in evaluation
>>> scenarios, so I don't see a big problem with it.
>>> Well, in our first version we now just add new types.  Works OK for us so
>>> far. However, its really just a first version...
>>> Katrin
>> --
>> Peter Klügl
>> University of Würzburg
>> pkluegl@uni-wuerzburg.de

Peter Klügl
University of Würzburg

View raw message