commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CSV-219) The behavior of quote char using is not similar as Excel does when the first string contains CJK char(s)
Date Sun, 10 Dec 2017 23:19:00 GMT

    [ https://issues.apache.org/jira/browse/CSV-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285409#comment-16285409
] 

Gary Gregory commented on CSV-219:
----------------------------------

Our quoting seems off IMO. Why not simply do:
{noformat}
diff --git a/src/main/java/org/apache/commons/csv/CSVFormat.java b/src/main/java/org/apache/commons/csv/CSVFormat.java
index 58948fd..dc7588b 100644
--- a/src/main/java/org/apache/commons/csv/CSVFormat.java
+++ b/src/main/java/org/apache/commons/csv/CSVFormat.java
@@ -1186,10 +1186,7 @@ public final class CSVFormat implements Serializable {
             } else {
                 char c = value.charAt(pos);

-                // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA =  %x20-21 / %x23-2B
/ %x2D-7E
-                if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23
|| c > 0x2B && c < 0x2D || c > 0x7E)) {
-                    quote = true;
-                } else if (c <= COMMENT) {
+                if (c <= COMMENT) {
                     // Some other chars at the start of a value caused the parser to fail,
so for now
                     // encapsulate if we start in anything less than '#'. We are being conservative
                     // by including the default comment char too.
diff --git a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
index ae7aae2..dde7c19 100644
--- a/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
+++ b/src/test/java/org/apache/commons/csv/CSVPrinterTest.java
@@ -1037,7 +1037,7 @@ public class CSVPrinterTest {
         final StringWriter sw = new StringWriter();
         try (final CSVPrinter printer = new CSVPrinter(sw, CSVFormat.RFC4180)) {
             printer.printRecord(EURO_CH, "Deux");
-            assertEquals("\"" + EURO_CH + "\",Deux" + recordSeparator, sw.toString());
+            assertEquals(EURO_CH + ",Deux" + recordSeparator, sw.toString());
         }
     }
{noformat}
I do not see why the first char in a record being not in TEXTDATA should quote the first field.

Thoughts from other. With the above patch, all tests pass.

> The behavior of quote char using is not similar as Excel does when the first string contains
CJK char(s)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: CSV-219
>                 URL: https://issues.apache.org/jira/browse/CSV-219
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Printer
>    Affects Versions: 1.5
>            Reporter: Zhang Hongda
>         Attachments: diff.patch
>
>
> When using CSVFormat.EXCEL to print a CSV file, the behavior of quote char using is not
similar as Microsoft Excel does when the first string contains Chinese, Japanese or Korean
(CJK) char(s).
> e.g.
> There are 3 data members in a record, with Japanese chars: "あ", "い", "う":
>   Microsoft Excel outputs:
>   あ,い,う
>   Apache Common CSV outputs:
>   "あ",い,う



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message