commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)
Date Wed, 31 Jul 2013 15:03:21 GMT
On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter <britter@apache.org>wrote:

> <snip>
>
> >> A use case I have now is a CSV file with a lot of columns (~90) but I
> only
> >> care about a small subset of the columns (~10). I'd like to be able to
> say
> >> withHeader(Set) where the Set may be a subset of the actual column names
> in
> >> the header line. This is different from withHeader(String[]) because the
> >> names in the Set must match the names in the header record.
>
> > >
> > > What you are talking about sounds more like a view or a projection of
> the
> > > actual content being parsed.
> > > Do we really need this for 1.0 or can it be postponed?
> >
> > This is a real scenario and a real need, not some imaginary complication
> ;)
> >
> > Even if it is not implemented for 1.0, we should talk about how it
> > should be done such that it fits in and does not cause API problems
> > later. And if I can get it done by then, then that much the better.
> >
>
> Okay, then let's discuss this on a new thread :-)
>
> As I've said, I think we should not push to much into
> withHeaders(String...). Maybe this is some sort of view, where you can pass
> a parser and the headers you are interested in and it will return an
> Iterable<CSVRecord> (or CSVParser) that just gives access to the specified
> headers you are interessted in?
>
> Would it be possible to give a code example of what you have to do with to
> current API in your use case and what you want?
>

I am switching to withHeader() with no arg (same as a new String[]{}) and
let the parser guess the headers and then pray that the names match between
the app and the files. Which is just as unsafe as forcing the headers in
fixed order on the parser because the column order might have changed.
Ideally, the column order should not matter, which it does not when you do
a record.get(String), which is nice.

Calling withHeader() with no args is less brittle than calling it with 90
args. The benefit is that the column order in the file can change without
affecting the app, which is good. I could use a little more bullet-proofing
by making the column names optionally case-insensitive, but that's a
different feature.

Ideally, I want to define the column names in the app as a simple Java
enum, then use an enum as a record key. That does not work for column names
that have spaces in them as mine do, so it's back to classic static final
Strings as keys. I could create a fancier custom enum but it's not worth it
for now.

Gary


> Benedikt
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message