commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Koszek (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SANDBOX-263) Excel strategy uses wrong separator
Date Sat, 20 Mar 2010 14:33:27 GMT

    [ https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419
] 

Peter Koszek edited comment on SANDBOX-263 at 3/20/10 2:32 PM:
---------------------------------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|http://www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use
a ";" some use a "." some use a ":""
At [Microsoft Support|http://support.microsoft.com/kb/94825/EN-US] we can read: "For most
international versions, the default list separator is a semicolon ( ; ). However, in Visual
Basic for Applications code, you must type the English function or property name and use a
comma (,) as a list separator. "

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code} 
CSV should be a standard, Excel is a specific application which uses the CSV standard in a
special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure
individual solutions.

      was (Author: peko):
    RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

On [Experts-Exchange|www.experts-exchange.com/Programming/Languages/Q_24113673.html]
we are told: "But in different countries the seperator is different. Some use a "," some use
a ";" some use a "." some use a ":""
At [Microsoft Support|support.microsoft.com/kb/94825/EN-US] we can read: "For most international
versions, the default list separator is a semicolon (\;). However, in Visual Basic for Applications
code, you must type the English function or property name and use a comma (,) as a list separator.
"

The following approach can help to predict a field separator.
On Windows, read registry key "HKCU\Control Panel\International\sList" if possible.
On other systems, try to avoid a collision with the floating point separator like this:
{code:java}
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
char decimalSeparator = dfs.getDecimalSeparator();
char listSeparator = ',';
// If the floating point separator is a comma, use semi-colon to minimize encapsulation
if (decimalSeparator == listSeparator) {
    listSeparator = ';';
}
{code} 
CSV should be a standard, Excel is a specific application which uses the CSV standard in a
special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the possibility to configure
individual solutions.
  
> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', COMMENTS_DISABLED,
ESCAPE_DISABLED, false, 
>                                                                  false, false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually {{';'}}. Thus,
parsing the CSV file as suggested in the JavaDoc of {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
> 	public static void main(final String[] args) {
> 		try {
> 			System.out.println("Using ;");
> 			parse("a;b\nc;d");
> 			System.out.println();
> 			System.out.println("Using ,");
> 			parse("a,b\nc,d");
> 		} catch (final IOException e) {
> 			e.printStackTrace();
> 		}
> 	}
> 	private static void parse(final String input) throws IOException {
> 		final String[][] data = (new CSVParser(new StringReader(input), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> 		for (final String[] row : data) {
> 			System.out.print("[");
> 			for (final String cell : row) {
> 				System.out.print("(" + cell + ")");
> 			}
> 			System.out.println("]");
> 		}
> 	}
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message