ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Bethard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes
Date Thu, 18 Jul 2013 02:18:49 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711942#comment-13711942
] 

Steven Bethard commented on CTAKES-217:
---------------------------------------

Thanks for the link - I didn't know about xcas_comparison. However it all revolves around
XCAS files, and does not work with a JCas directly. So it looks like it would be pretty difficult
to adapt that code.
                
> create a tool for "diff"-ing two CASes
> --------------------------------------
>
>                 Key: CTAKES-217
>                 URL: https://issues.apache.org/jira/browse/CTAKES-217
>             Project: cTAKES
>          Issue Type: New Feature
>            Reporter: Steven Bethard
>
> It would be handy to be able to easily get a "diff" of two CASes. Some possibilities:
> (1) Just diff the XMIs. This doesn't work very well because the IDs are typically different
in different XMIs generated from the same annotations.
> (2) Output all annotations, using their .toString(), and diff that file using a standard
diff algorithm. This might mostly work if we could guarantee a consistent ordering of the
annotations in the CAS. (That's easy to do for Annotations, but not always possible for TOPs.)
But some things aren't displayed in the .toString(), e.g. the values inside FSArrays and FSLists.
> In r1504269, I added CompareFeatureStructures which isn't either of these, but is a bit
closer to (2). It sorts annotations by offset (and for TOPs, looks through their features
to find offsets), and then compares each pair of FeatureStructures by walking the tree of
their features. I'm mostly happy with how it handles the comparison of two FeatureStructures
(though .toString() is a bit hacky).
> The main issue is that it doesn't really do anything useful if you have different numbers
of annotations in the two CASes. It just prints a message saying that the numbers are different.
Instead, it should be able to identify insertions and deletions of annotations. Probably there's
a way to do this with java-diff-utils, though I wasn't able to figure one out on my first
attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message