commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Koszek (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SANDBOX-263) Excel strategy uses wrong separator
Date Fri, 19 Mar 2010 16:22:35 GMT

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419
] 

Peter Koszek commented on SANDBOX-263:
--------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator like this:

         // The following idea is based on a comment from
         // http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to minimize encapsulation
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV standard in a
special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure
individual solutions.

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED,
ESCAPE_DISABLED, false, 
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus,
parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> 	public static void main(final String[] args) {
> 		try {
> 			System.out.println("Using ;");
> 			parse("a;b\nc;d");
> 			System.out.println();
> 			System.out.println("Using ,");
> 			parse("a,b\nc,d");
> 		} catch (final IOException e) {
> 			e.printStackTrace();
> 		}
> 	}
> 	private static void parse(final String input) throws IOException {
> 		final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> 		for (final String[] row : data) {
> 			System.out.print("[");
> 			for (final String cell : row) {
> 				System.out.print("(" + cell + ")");
> 			}
> 			System.out.println("]");
> 		}
> 	}
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message