commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Eatwell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CSV-200) CSVFormat cannot read its own output if input contain escape character followed by quote character
Date Mon, 24 Oct 2016 14:01:06 GMT

    [ https://issues.apache.org/jira/browse/CSV-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602069#comment-15602069
] 

Vladimir Eatwell commented on CSV-200:
--------------------------------------

As far as I can tell the issue is that I can set both a quote mode and an escape character.
In the CSVPrinter values are not escaped is a quote mode is set:
{code}
    private void print(final Object object, final CharSequence value, final int offset, final
int len,
            final Appendable out, final boolean newRecord) throws IOException {
        if (!newRecord) {
            out.append(getDelimiter());
        }
        if (object == null) {
            out.append(value);
        } else if (isQuoteCharacterSet()) {
            // the original object is needed so can check for Number
            printAndQuote(object, value, offset, len, out, newRecord);
        } else if (isEscapeCharacterSet()) {
            printAndEscape(value, offset, len, out);
        } else {
            out.append(value, offset, offset + len);
        }
    }
{code}
i.e. we either printAndQuote OR printAndEscape

However, in the CSVParser characters are unescaped inside quoted values:

{code}
    private Token parseEncapsulatedToken(final Token token) throws IOException {
        // save current line number in case needed for IOE
        final long startLineNumber = getCurrentLineNumber();
        int c;
        while (true) {
            c = reader.read();

            if (isEscape(c)) {
                final int unescaped = readEscape();
                if (unescaped == Constants.END_OF_STREAM) { // unexpected char after escape
                    token.content.append((char) c).append((char) reader.getLastChar());
                } else {
                    token.content.append((char) unescaped);
                }
            } else if (isQuoteChar(c)) {
...
{code}

> CSVFormat cannot read its own output if input contain escape character followed by quote
character
> --------------------------------------------------------------------------------------------------
>
>                 Key: CSV-200
>                 URL: https://issues.apache.org/jira/browse/CSV-200
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.4
>            Reporter: Vladimir Eatwell
>
> I can format CSV using CSVFormat that is subsequently unparsable by CSVFormat, the test
below illustrates the failure:
> {code}
> import org.apache.commons.csv.CSVFormat;
> import org.apache.commons.csv.CSVRecord;
> import org.apache.commons.csv.QuoteMode;
> import org.junit.Test;
> import java.io.StringReader;
> import java.util.List;
> public class CSVFormatTest {
>     @Test
>     public void parseFailure() throws Exception {
>         CSVFormat formatter = CSVFormat.DEFAULT;
>         formatter = formatter.withDelimiter(',');
>         formatter = formatter.withQuote('*');
>         formatter = formatter.withEscape('/');
>         formatter = formatter.withNullString("NULL");
>         formatter = formatter.withIgnoreSurroundingSpaces(true);
>         formatter = formatter.withQuoteMode(QuoteMode.MINIMAL);
>         String formatted = formatter.format("bob/*", "token");
>         List<CSVRecord> parsed = formatter.parse(new StringReader(formatted)).getRecords();
>         for (CSVRecord record : parsed) {
>             System.out.println(record.size());
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message