metamodel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Mumm (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (METAMODEL-1086) Encoding not used with InputStreams in CsvDataContext
Date Fri, 20 May 2016 13:55:12 GMT

    [ https://issues.apache.org/jira/browse/METAMODEL-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293404#comment-15293404
] 

Samuel Mumm commented on METAMODEL-1086:
----------------------------------------

Just saw there is a github repro. Will try to provide a pull request there soon.

> Encoding not used with InputStreams in CsvDataContext
> -----------------------------------------------------
>
>                 Key: METAMODEL-1086
>                 URL: https://issues.apache.org/jira/browse/METAMODEL-1086
>             Project: Apache MetaModel
>          Issue Type: Bug
>    Affects Versions: 4.5.2
>            Reporter: Samuel Mumm
>
> When using the Constructor with InputStreams you can get into trouble with encoding if
the default encoding of your platform is different than the one used in the InputStream even
though you specify an encoding in the CvsConfiguration.
> {code}
> CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new CsvConfiguration(1,
"utf-8", ';', '"', '\\'));
> {code}
> The offending code is in the static method createFileFromInputStream():
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String encoding)
{
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, encoding);
>         final BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
>         ....
> {code}
> The InputStreamReader is instantiated without a charset. In this case the Platforms default
charset is used (e.g. "windows-1252"). The BufferedWriter on the other hand is instantiated
with the specified charset. This effectively causes a re-encoding if the file is in a different
encoding (e.g. "utf-8") than the platforms default encoding when the content of the stream
is written to the temp directory. 
> Instead the code should be similar to this: 
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String encoding)
{
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, encoding);
>         final BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream,
encoding));
>         ....
> {code}
> On the other hand you can skip the encoding completely when copying the InputStream.
The encoding is used later when the FileResource is read. An alternative and more readable
implementation in Java 7 would be:
> {code}
>             tempFile = File.createTempFile("metamodel", ".csv");
>             tempFile.deleteOnExit();
>             Files.copy(resourceAsStream, tempFile.toPath(), StandardCopyOption.REPLACE_EXISTING);
>             return tempfile;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message