uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Sominsky" <somin...@gmail.com>
Subject Re: annotation comparator
Date Fri, 05 Sep 2008 18:04:42 GMT
Peter,

CFE support configuration driven feature extraction. The extracted features 
can be used to do the comparison among other functions. As Eddie pointed in 
his email, the application decides what features are relevant to a 
particular comparison. Also the criteria for comparison of extracted 
features can be different for every application.
With CFE we performed comparison in 3 major steps
1. Identification of features to be compared in both sources and rules of 
their comparison
    On this step we write configuration files for feature extraction
2. Feature extraction and alignment of extracted features. On this step the 
identified features are extracted from both sources that are being compared 
into character separated files and they are aligned based on begin|end 
offsets of there containing annotation objects.
3. The results of the alignment is imported into a spreadsheet where 
performance metrics (precision/recall/f-score)
are calculated,

No doubts that the process should be futher automated.

Let me know if you have question.

----- Original Message ----- 
From: "Peter Klügl" <pkluegl@uni-wuerzburg.de>
To: <uima-user@incubator.apache.org>
Sent: Friday, September 05, 2008 11:50 AM
Subject: Re: annotation comparator


> Hi,
>
> thanks for your answers.
>
> A default text-based diff of pretty printed annotations may not be a 
> solution for my specific requirements, but is a nice alternative for 
> manual testing (I am already using the pretty print methods for that). I 
> think i will keep my simple solution as a start, which is working similar 
> to your proposed one, but directly compares the features of the 
> annotations in java.
>
> I was wondering if the CFE project was supporting some sort of comparison 
> or testing since the paper has "testing" in its title, but I haven't found 
> any suitable fragments in the source code.
>
> On the long run, a good and reusable solution for the comparison and 
> automatic back-testing of annotation and/or FS can become a interesting 
> component. Maybe there is a possibility to combine some efforts? (pointing 
> amongst others to Katrin)
>
> have a nice weekend,
>
> Peter
>
>
>
> Eddie Epstein schrieb:
>> The problem with generic CAS comparison is the potential complexity of 
>> the
>> object model represented in a CAS. Instead of a single general purpose
>> method, another approach is application (or object model) specific
>> formatting code that would create output specifically designed for
>> comparison.
>>
>> If the object model to be compared is limited to annotations, just 
>> dumping
>> all annotations, each as a single line without covered text, in index 
>> order
>> would be useful as input to a standard diff program. Further sorting by
>> annotation type before the diff might help make the differences more
>> understandable in some situations.
>>
>> If you are interested, there are some annotation pretty print options in
>> UIMA that could help here.
>>
>> Eddie
>>
>> On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl 
>> <pkluegl@uni-wuerzburg.de>wrote:
>>
>>
>>> Hi,
>>>
>>> what is the status quo for the comparison of two CAS right now? Is there
>>> yet any usable solution (with or without documentation)?
>>>
>>> I am developing a rule-based system (with scripting functionalities)
>>> especially for complex information and text extraction tasks. The IDE is
>>> DLTK-based and UIMA descriptors (for a generic implementation) are 
>>> generated
>>> automatically. Currently i am improving a information extraction 
>>> application
>>> with a test-driven approach. The test cases are, of course, CAS XMI 
>>> files
>>> and the comparison (of two CAS) is working, but yet unsatisfying. I am
>>> especially interested in annotations for the false positives and false
>>> negatives (overlapping or not overlapping).
>>>
>>> Back to my question:
>>> How do you all compare two CAS?
>>> Is there a reusable implementation?
>>>
>>>
>>> Peter
>>>
>>>
>>> Katrin Tomanek schrieb:
>>>
>>>  Hi,
>>>
>>>>  Depends what your favorite tooling story is.  If you prefer
>>>>
>>>>> the eclipse tooling, it should go into eclipse.  I know
>>>>> people who would use this kind of functionality if it was
>>>>> in CVD :-)
>>>>>
>>>>>
>>>>>
>>>>>> And shouldn't the differences be kept as new annotation types so
the
>>>>>> viewers don't need to be changed?
>>>>>>
>>>>>>
>>>>> Somehow I don't see that.  The tooling could be made a lot
>>>>> nicer if it knows it's displaying differences.  And I wouldn't
>>>>> want to add annotations to my data just for display purposes.
>>>>> Or maybe I misunderstood?
>>>>>
>>>>>
>>>> Mh, not sure. This is probably data that is only used in evaluation
>>>> scenarios, so I don't see a big problem with it.
>>>> Well, in our first version we now just add new types.  Works OK for us 
>>>> so
>>>> far. However, its really just a first version...
>>>>
>>>> Katrin
>>>>
>>>>
>>> --
>>> Peter Klügl
>>> University of Würzburg
>>> pkluegl@uni-wuerzburg.de
>>>
>>>
>>>
>>
>>
>
>
> -- 
> Peter Klügl
> University of Würzburg
> pkluegl@uni-wuerzburg.de
> 


Mime
View raw message