commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Serge P. Nekoval (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CSV-229) Allow byte position tracking in CSVParser
Date Tue, 03 Jul 2018 20:48:00 GMT
Serge P. Nekoval created CSV-229:
------------------------------------

             Summary: Allow byte position tracking in CSVParser
                 Key: CSV-229
                 URL: https://issues.apache.org/jira/browse/CSV-229
             Project: Commons CSV
          Issue Type: New Feature
          Components: Parser
            Reporter: Serge P. Nekoval
         Attachments: csv_bytes.patch

This is a patch which adds significant modifications to the ExtendedBufferedReader.

The problem is that efficient CSV parsing requires *byte positioning*, not character positioning
as currently provided.

The cases where byte positioning is necessary:
* Suspend/resume parsing
* Pagination/split where a large CSV file is read in chunks using file positioning.

I've found the ExtendedBufferedReader to be unable to manage bytes in its current state (relying
on BufferedReader and characters), so instead I had to redesign/merge these two classes.

This modification is what we use in our system, so I'm hoping to get it released (otherwise
we have to deal with custom build of Commons CSV).

Architecturally the solution might be incomplete, however it provides what I need - getBytePosition()
from a CSVParser. The entire chain only works if you provide a Reader AND a charset!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message