ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Bethard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-217) create a tool for "diff"-ing two CASes
Date Wed, 17 Jul 2013 22:46:49 GMT
Steven Bethard created CTAKES-217:

             Summary: create a tool for "diff"-ing two CASes
                 Key: CTAKES-217
                 URL: https://issues.apache.org/jira/browse/CTAKES-217
             Project: cTAKES
          Issue Type: New Feature
            Reporter: Steven Bethard

It would be handy to be able to easily get a "diff" of two CASes. Some possibilities:

(1) Just diff the XMIs. This doesn't work very well because the IDs are typically different
in different XMIs generated from the same annotations.

(2) Output all annotations, using their .toString(), and diff that file using a standard diff
algorithm. This might mostly work if we could guarantee a consistent ordering of the annotations
in the CAS. (That's easy to do for Annotations, but not always possible for TOPs.) But some
things aren't displayed in the .toString(), e.g. the values inside FSArrays and FSLists.

In r1504269, I added CompareFeatureStructures which isn't either of these, but is a bit closer
to (2). It sorts annotations by offset (and for TOPs, looks through their features to find
offsets), and then compares each pair of FeatureStructures by walking the tree of their features.
I'm mostly happy with how it handles the comparison of two FeatureStructures (though .toString()
is a bit hacky).

The main issue is that it doesn't really do anything useful if you have different numbers
of annotations in the two CASes. It just prints a message saying that the numbers are different.
Instead, it should be able to identify insertions and deletions of annotations. Probably there's
a way to do this with java-diff-utils, though I wasn't able to figure one out on my first

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message