commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel Bourg (Commented) (JIRA)" <>
Subject [jira] [Commented] (CSV-67) UnicodeUnescapeReader should not be applied before parsing
Date Fri, 16 Mar 2012 08:13:43 GMT


Emmanuel Bourg commented on CSV-67:

Good point but I'm not sure it actually happens. So far the only application I have found
supporting unicode escapes is HSQLDB. It can read them but doesn't write them (I checked HSQL
1.8, I'll look at 2.x). I believe these unicode escapes are typically created by a program
like native2ascii which converts only non ascii characters, so I believe the line separators
are safe.

I agree on removing the unicode escape setting from CSVFormat. I would prefer submitting the
reader to [io] than making it public in [csv] though.
> UnicodeUnescapeReader should not be applied before parsing
> ----------------------------------------------------------
>                 Key: CSV-67
>                 URL:
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Sebb
> The UnicodeEscapeReader is currently applied before the input file is parsed.
> This means that unicode escapes are treated differently from other escapes.
> For example, the sequence <esc>r<esc>n is not treated as a new-line for the
purpose of recognising the end of a record, yet \o000D\u000A is converted to CRLF and would
terminate the record (unless embedded in a quoted string).
> The unicode escape processing (if selected) should occur as part of the parsing, just
as for ordinary escape processing.
> The class can be made public so the user can wrap the input if required; this preserves
the existing functionality should it be required, so there is no need to introduce another

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message