subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Sperling <>
Subject Re: SVN Blame Returns Corrupt Data
Date Fri, 11 Oct 2013 17:25:19 GMT
On Fri, Oct 11, 2013 at 09:52:31AM -0700, Ben Reser wrote:
> On 10/11/13 9:22 AM, Branko ─îibej wrote:
> > You'd have to extend Subversion's file type detection to detect UTF-16.
> > See svn_io_detect_mimetype2 in line 3333 in this file:
> > 
> >
> > Subversion currently only looks at the first 1k Bytes of a file. It may
> > be enough to check that this initial part of the file contains only
> > valid UTF-16 (BE or LE) codes.
> Even if all we looked for is the BOM it might be helpful enough.  I suspect the
> development tools producing UTF-16 are including BOMs.  Windows seems to be
> fond of including them, Notepad puts one even on UTF-8.

Couldn't Subversion automatically convert UTF-16 files to UTF-8 before
processing them for diff/merge/blame, and convert output written to
the original files back to UTF-16?

That would require some work because existing streams, strings, and files
passed around in the code would need to be wrapped so that translation
to/from the internal from/to the external encoding is seamless.

But I don't see why such an approach couldn't be made to work in principle.
It might even result in some spring cleaning in the code base and pave the
way for improved handling of file formats such as XML for diff and merge.

What do you think? Is it worth adding this to our project ideas page?

View raw message