commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CSV-155) Get remaining unformatted text
Date Tue, 11 Aug 2015 07:05:45 GMT

    [ https://issues.apache.org/jira/browse/CSV-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681343#comment-14681343
] 

Gary Gregory commented on CSV-155:
----------------------------------

If your charset for a file uses one byte per character, you could call org.apache.commons.csv.CSVRecord.getCharacterPosition()
on the second row, close the parser and read the input with the JRE from that position until
then end.

> Get remaining unformatted text
> ------------------------------
>
>                 Key: CSV-155
>                 URL: https://issues.apache.org/jira/browse/CSV-155
>             Project: Commons CSV
>          Issue Type: New Feature
>          Components: Parser
>    Affects Versions: 1.1
>            Reporter: Jason Steenstra-Pickens
>
> I have the requirement where I need to parse the headers of a CSV string so that I can
validate them and then remove those headers.
> The problem is that the CSVParser creates an internal ExtendedBufferedReader from the
given reader (which in my case is a StringReader). Reading the first record will read and
buffer additional records so if I then try and read directly from the StringReader it does
not return anything.
> To solve this I need to be able to do one of the following:
> # pass in my own ExtendedBufferedReader (so would need to be made public)
> # have a new getter method to be able to retrieve the internal ExtendedBufferedReader
(or as a BufferedReader, either will do)
> # have a new method on the CSVParser to be able to retrieve the remaining raw records.
> The current workaround is to read all the records and then write all the records except
the first back out to a StringWriter but this is a lot of unnecessary work and code. This
doesn't really work since it reader and writing it back out has a very high chance of modifying
the quotes, escapes, whitespace etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message