creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: RAT-190 - default encoding UTF-8 / patch / what should be implemented?
Date Wed, 18 Feb 2015 02:23:37 GMT
On 17 February 2015 at 22:59, P. Ottlinger <pottlinger@apache.org> wrote:
> Hi *,
>
> after finalizing the analysis on
> https://issues.apache.org/jira/browse/RAT-190
> it seems that RAT is not explicit enough when it comes to encoding.
>
> CAUSE/BUG BACKGROUND
> If mvn is configured to run with a non UTF-8 encoding there will be
> problems when matching UTF-8 content with licenses.
>
> PATCH PROPOSAL
> I've browsed over some of the code parts and added some "UTF-8" to make
> it more explicit that UTF-8 should be the default. What do you think of
> that proposal?
>
> YOU FEEDBACK WANTED
> 1) Is it sufficient enough?
> 2a) Should we have a RAT configuration option to allow specific setting
> of encodings? With UTF-8 as default if not configured/set otherwise.
> 2) Should we just use UTF-8 as default (hardcoded) and do not give the
> user a chance to set the encoding to use.
>
> IMPROVE TESTABILITY?
> Since we seem to run with UTF-8 encoding in Jenkins we did not see these
> problems before. Does anyone have a good idea on how to test this?
> A UTF-8 encoded file should be analysed with mvn -Dfile.encoding!=UTF-8?
>
> Cheers & thanks for any opinions :-)

Seems to me that there are several potentially different encodings
involved here.

The encoding used for the license file templates.
Ones that are defined in built-in strings should not be an issue, but
the templates can be externally provided, either as files or as part
of the pom.
The encoding used for the files being checked.

I think we can assume that all the source files will have the same
encoding, but that may differ from the templates.
I assume that Maven takes care of the encoding when interpreting the pom.

If external templates are used, then can we insist that these use the
same encoding as the source files?

Does RAT include any template files? If so, presumably they have a
fixed (known) encoding.
Does RAT know when reading a template file whether it is a 3rd party
file or not?

> Phil

Mime
View raw message