commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CSV-110) Add ability to parse single lines
Date Sat, 05 Apr 2014 05:15:16 GMT


Gabriel Reid commented on CSV-110:

Thanks for the info [~garydgregory].

The main use case that I'm interesting in supporting (and which I didn't make very clear in
the description) is the concept of being able to reuse a single CSVLineParser to parse multiple
lines of input which aren't available as as single contiguous stream.

As you pointed out, it's currently easy to parse a single line, but this single line parsing
also requires instantiating a new CSVParser for every input line. Although it may seem like
a micro-optimization in trying to get around this CSVParser instantiation for each input line,
I believe it's worth the effort when parsing many billions of input records (which is a common
use case when working with Hadoop).

> Add ability to parse single lines
> ---------------------------------
>                 Key: CSV-110
>                 URL:
>             Project: Commons CSV
>          Issue Type: New Feature
>            Reporter: Gabriel Reid
>         Attachments: CSV-110.patch
> Due to the iterator-based API of CSVParser, there is currently no simple and convenient
way to parse single lines of CSV-formatted data. The intention of this ticket is to add something
along the lines of the following:
> {code}
> CSVLineParser lineParser = new CSVLineParser(csvFormat);
> String singleLine = "a,b,c";
> CSVRecord singleRecord lineParser.parseLine(singleLine);
> {code}
> The use case of parsing single lines comes up very often in terms of distributed batch
processing scenarios (i.e. Hadoop jobs), and CSV-style formats are also regularly used in
such scenarios. Currently, projects are often forced to build their own ad-hoc CSV parsing
solutions, so adding the ability to parse single lines to commons-csv would be very useful
to these projects, as well as anyone doing parsing based on input that isn't necessary in
the form of a single stream.

This message was sent by Atlassian JIRA

View raw message