commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-404) Adding an implementation of Eugene Myers difference algorithm
Date Mon, 29 Apr 2013 19:38:16 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644782#comment-13644782
] 

Thomas Neidhart commented on COLLECTIONS-404:
---------------------------------------------

Moved the package in r1477287.
Additionally, as a best practice in commons, made the object in EditCommand private and added
a getter.

For the Commands, I am now unsure if the refactoring really makes sense. We could change the
append methods in EditScript to be similar to the Visitor (e.g. appendInsertCommand, appendKeepCommand,
...) and thus completely hiding this implementation detail in the EditScript (which is a good
thing in commons due to the strict API rules). Otoh the current API is also good OO design,
so I am inclined to keep it as is.

My original idea was to do merging of commands (e.g. the EditScript would check if the last
command was the same as the current and then merge them, each command would have a list of
T instead of a single T), to save memory as we do not need to instantiate a new command for
a sequence of equal commands (can be an issue for large sequences). But the trade-off would
be to create a List for each command, so the gain may not be as great as originally thought.
                
> Adding an implementation of Eugene Myers difference algorithm
> -------------------------------------------------------------
>
>                 Key: COLLECTIONS-404
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-404
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 3.2.1
>         Environment: all
>            Reporter: Luc Maisonobe
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: commons-collections-difference.patch, commons-collections-difference-v2.patch,
comparator.zip, DiffTest.java
>
>
> The difference algorithm aims at comparing two sequences of objects and return an "edit
script" which represents how one can transform the first sequence into the second sequence.
The script describes the various insert object, delete object and keep object commands. The
script is guaranteed to be the shortest possible in terms of number of commands.
> From the script, one can either extract longest common sub-sequences (i.e. how similar
the sequences are) or on the contrary the needed changes (i.e. how different the sequences
are).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message