commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anson Schwabecher (JIRA)" <>
Subject [jira] [Commented] (CSV-226) Add CSVParser test case for standard charsets
Date Thu, 31 May 2018 00:47:00 GMT


Anson Schwabecher commented on CSV-226:


I'm not sure if we have control of whether a file is truly treated as binary within the git

However, we can control how the client cloned repo gets/puts files with the .gitattributes

If I want to prevent git from auto-converting crlf sequences I would use the -text attribute
option.  i.e.: This might be helpful if you were testing how crlf sequences are consumed
in the Lexer and want specific crlf sequences stored in the test fixture.

I am sure that the existing test fixture files are not considered binary because if I make
a change to them, git diff would return "Binary files xxx differ".


This command shows that no attributes have been set on test fixture files when run from the
project root:

{{git check-attr -a src/test/resources/**/*}}


If you wanted to add a project specific .gitattributes file at the project root that mandates
no crlf conversion for test fixture files you could do so like this:

{{echo "src/test/resources/**/* -text" > .gitattributes}}


I don't think you want them treated as binary (-diff attr) because it would disable text diffing
as well.




> Add CSVParser test case for standard charsets
> ---------------------------------------------
>                 Key: CSV-226
>                 URL:
>             Project: Commons CSV
>          Issue Type: Test
>          Components: Parser
>    Affects Versions: 1.5
>            Reporter: Anson Schwabecher
>            Priority: Minor
> Hello, I'd like to contribute a CSVParser test suite for standard charsets as defined
in java.nio.charset.StandardCharsets + UTF-32.
> This is a standalone test but is also in support of a fix for CSV-107.  It also refactors
and unifies the testing around your established workaround of inserting BOMInputStream ahead
of the CSVParser.
> It will take a single base UTF-8 encoded file (cstest.csv) and copy it to multiple output
files (in target dir) with differing character sets, similar to the iconv tool.  Each file
will then be fed into the parser to test all the BOM/NOBOM unicode variants.  I think a file
based approach is still important here rather than just encoding a character stream inline
as a string, that way if issues develop it's easy to inspect the data.
> I noticed in the project’s pom.xml (rat config) that you are excluding individual test
resource files by name rather than using a wildcard expression to exclude every file in the
directory.  Is there a reason for this? It’s much better if devs do not have to maintain
this configuration.
> i.e.: switch over to a single exclude expression:
> {{<exclude>src/test/resources/**/*</exclude>}}

This message was sent by Atlassian JIRA

View raw message