subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Foad <julian.f...@wandisco.com>
Subject Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning
Date Wed, 05 Jan 2011 11:11:01 GMT
On Sat, 2011-01-01, Johan Corveleyn wrote:
> [ Taking a privately-started discussion with danielsh to the list, in
> case others have inspiration/insight about this. Question at hand: I'm
> having trouble making diff3 work with prefix/suffix scanning of the
> diff-optimizations-bytes branch. Any feedback is highly appreciated
> :-). ]
> 
> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> [...snip...]
> > In diff3 with prefix enabled, I was seeing failures like this:
> >
> > EXPECTED:
> > line1
> > <<<
> > line2
> > ===
> > line2 modified
> >>>>
> >
> > ACTUAL:
> > <<<
> > line1
> > line2
> > ===
> > line1
> > line2 modified
> >>>>
> >
> > so I assumed that when the prefix code gobbled the shared prefix.  How
> > to fix it from there was purely guesswork on my part (and I haven't
> > implemented it).
> >
> > These failures would go away if I prepended a "line0 left" and "line0
> > right" to the file.
> 
> Yes, I know. That's indeed because the prefix-scanning code gobbled up
> the identical prefix. That problem is also there in diff2.
> 
> In diff2, I fixed this by simply pre-pending a piece of "common" diff
> to the diff linked list, in the function diff.c#svn_diff__diff.
> 
> >From r1037371:
> [[[
> --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> (original)
> +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> Sun Nov 21 02:36:07 2010
> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
>   svn_diff_t *diff;
>   svn_diff_t **diff_ref = &diff;
> 
> +  if (want_common && (original_start > 1))
> +    {
> +      /* we have a prefix to skip */
> +      (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
> +
> +      (*diff_ref)->type = svn_diff__type_common;
> +      (*diff_ref)->original_start = 0;
> +      (*diff_ref)->original_length = original_start - 1;
> +      (*diff_ref)->modified_start = 0;
> +      (*diff_ref)->modified_length = modified_start - 1;
> +      (*diff_ref)->latest_start = 0;
> +      (*diff_ref)->latest_length = 0;
> +
> +      diff_ref = &(*diff_ref)->next;
> +    }

Hi Johan.

I know this is work in progress but that whole "if" block of code is a
good example of where a comment above the block would be *really* useful
to tell the reader what it's for.  :-)

(Also, all that dereferencing is unnecessary in that particular block,
but I looked at the rest of the function and saw that that style is used
in two other blocks where it is not unnecessary.  Personally I would
find the following style more readable throughout the function:

       {
         svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff));
         new_diff->type = ...
         ...

         *diff_ref = new_diff;
         diff_ref = &new_diff->next;
       }
)

>   while (1)
>     {
>       if (original_start < lcs->position[0]->offset
> ]]]
> 
> Naively, I wanted to do the same for diff3 (along with some other
> tricks, to make the algorithm aware of the prefix_lines), but it
> doesn't work. See the patch in attachment.
> 
> I mean: it works partly: diff-diff3-test.exe passes with the attached
> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
> merge_tests.py (61) don't. I haven't studied the test failures in
> detail (I probably should, but I currently can't wrap my head around
> those tests).
> 
> I'm now pondering another approach, to pre-pend an "lcs" piece to the
> lcs linked list, and let the rest of the algorithm do its work as
> normal. Inspiration for this came when I was reading the code of
> diff3.c#svn_diff__resolve_conflict, where a similar trick is used
> around line 121:
> [[[
>     /* If there were matching symbols at the start of
>      * both sequences, record that fact.
>      */
>     if (common_length > 0)
>       {
>         lcs = apr_palloc(subpool, sizeof(*lcs));
>         lcs->next = NULL;
>         lcs->position[0] = start_position[0];
>         lcs->position[1] = start_position[1];
>         lcs->length = common_length;
> 
>         lcs_ref = &lcs->next;
>       }
> ]]]
> 
> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
> I'm already passing the prefix_lines as a new parameter. If that would
> work out, I could probably remove the special common-diff-injecting
> code in diff.c#svn_diff__diff.
> 
> Thoughts?

Just from reading what you say here, that last solution sounds much
better.

- Julian


> It will take me some time/experimentation to work out the details, but
> I think that should work ...
> 
> Cheers,



Mime
View raw message