commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Serge P. Nekoval (JIRA)" <>
Subject [jira] [Updated] (CSV-229) Allow byte position tracking in CSVParser
Date Tue, 03 Jul 2018 21:22:00 GMT


Serge P. Nekoval updated CSV-229:
    Attachment: csv_bytes3.patch

> Allow byte position tracking in CSVParser
> -----------------------------------------
>                 Key: CSV-229
>                 URL:
>             Project: Commons CSV
>          Issue Type: New Feature
>          Components: Parser
>            Reporter: Serge P. Nekoval
>            Priority: Major
>         Attachments: csv_bytes3.patch
> This is a patch which adds significant modifications to the ExtendedBufferedReader.
> The problem is that efficient CSV parsing requires *byte positioning*, not character
positioning as currently provided.
> The cases where byte positioning is necessary:
> * Suspend/resume parsing
> * Pagination/split where a large CSV file is read in chunks using file positioning.
> I've found the ExtendedBufferedReader to be unable to manage bytes in its current state
(relying on BufferedReader and characters), so instead I had to redesign/merge these two classes.
> This modification is what we use in our system, so I'm hoping to get it released (otherwise
we have to deal with custom build of Commons CSV).
> Architecturally the solution might be incomplete, however it provides what I need - getBytePosition()
from a CSVParser. The entire chain only works if you provide a Reader AND a charset!

This message was sent by Atlassian JIRA

View raw message