ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Joseph Masanz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes
Date Thu, 18 Jul 2013 01:50:49 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711921#comment-13711921

James Joseph Masanz commented on CTAKES-217:

have you taken a look at 

Without looking at the source, I've forgotten most of the little I once knew about it.
But we had suggested it in cTAKES 1.0 for helping people compare some parts at least
Maybe you will find some part of it helpful?

-- James

> create a tool for "diff"-ing two CASes
> --------------------------------------
>                 Key: CTAKES-217
>                 URL: https://issues.apache.org/jira/browse/CTAKES-217
>             Project: cTAKES
>          Issue Type: New Feature
>            Reporter: Steven Bethard
> It would be handy to be able to easily get a "diff" of two CASes. Some possibilities:
> (1) Just diff the XMIs. This doesn't work very well because the IDs are typically different
in different XMIs generated from the same annotations.
> (2) Output all annotations, using their .toString(), and diff that file using a standard
diff algorithm. This might mostly work if we could guarantee a consistent ordering of the
annotations in the CAS. (That's easy to do for Annotations, but not always possible for TOPs.)
But some things aren't displayed in the .toString(), e.g. the values inside FSArrays and FSLists.
> In r1504269, I added CompareFeatureStructures which isn't either of these, but is a bit
closer to (2). It sorts annotations by offset (and for TOPs, looks through their features
to find offsets), and then compares each pair of FeatureStructures by walking the tree of
their features. I'm mostly happy with how it handles the comparison of two FeatureStructures
(though .toString() is a bit hacky).
> The main issue is that it doesn't really do anything useful if you have different numbers
of annotations in the two CASes. It just prints a message saying that the numbers are different.
Instead, it should be able to identify insertions and deletions of annotations. Probably there's
a way to do this with java-diff-utils, though I wasn't able to figure one out on my first

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message