commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Gäckle (JIRA) <>
Subject [jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter
Date Wed, 28 Mar 2018 06:57:00 GMT


Patrick Gäckle commented on CSV-222:

Setting the end-of-record marker to SOH-STX-LF would help me as this would match my current
Recovering from junk would be the long lasting solution. I can think of an _lazy reading option_
that instead of throwing an error
when something unexpected happens between encapsulated token and delimiter just continues
without taking any action like appending text to current field/header or continueing to the
next field.


> invalid char between encapsulated token and delimiter
> -----------------------------------------------------
>                 Key: CSV-222
>                 URL:
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.4
>            Reporter: Patrick Gäckle
>            Priority: Major
>         Attachments: faulty.csv
> When trying to read the file [^faulty.csv] and parse it I get the following error:
> {code}
> (line 1) invalid char between encapsulated token and delimiter
> 	at org.apache.commons.csv.Lexer.parseEncapsulatedToken(
> 	at org.apache.commons.csv.Lexer.nextToken(
> 	at org.apache.commons.csv.CSVParser.nextRecord(
> 	at org.apache.commons.csv.CSVParser.initializeHeader(
> 	at org.apache.commons.csv.CSVParser.<init>(
> 	at org.apache.commons.csv.CSVParser.<init>(
> 	at org.apache.commons.csv.CSVFormat.parse(
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the end of
> I debugged through the source of this and ran into the Exception in the Lexer not handling
these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not familiar with
these type of characters and what behaviour they should have.
> Sincerely

This message was sent by Atlassian JIRA

View raw message