commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luc Maisonobe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-404) Adding an implementation of Eugene Myers difference algorithm
Date Mon, 29 Apr 2013 20:20:16 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644822#comment-13644822
] 

Luc Maisonobe commented on COLLECTIONS-404:
-------------------------------------------

Your idea makes sense. The algorithm is mainly dedicated to cases where the number of changes
is small with respect to the sequence size. It is O(nd) where n is the sequence length and
d is the numbers of differences. It is a good algorithm when d is orders of magnitude smaller
than n.

This means that it really expect to have at least a large number of "keep" commands and hence
a large number of consecutive ones can be expected. So at least for "keep" commands, automatically
collating them is good.
                
> Adding an implementation of Eugene Myers difference algorithm
> -------------------------------------------------------------
>
>                 Key: COLLECTIONS-404
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-404
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 3.2.1
>         Environment: all
>            Reporter: Luc Maisonobe
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: commons-collections-difference.patch, commons-collections-difference-v2.patch,
comparator.zip, DiffTest.java
>
>
> The difference algorithm aims at comparing two sequences of objects and return an "edit
script" which represents how one can transform the first sequence into the second sequence.
The script describes the various insert object, delete object and keep object commands. The
script is guaranteed to be the shortest possible in terms of number of commands.
> From the script, one can either extract longest common sub-sequences (i.e. how similar
the sequences are) or on the contrary the needed changes (i.e. how different the sequences
are).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message