commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)
Date Wed, 31 Jul 2013 18:38:55 GMT
2013/7/31 Gary Gregory <garydgregory@gmail.com>

> On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter <britter@apache.org
> >wrote:
>
> > <snip>
> >
> > >> A use case I have now is a CSV file with a lot of columns (~90) but I
> > only
> > >> care about a small subset of the columns (~10). I'd like to be able to
> > say
> > >> withHeader(Set) where the Set may be a subset of the actual column
> names
> > in
> > >> the header line. This is different from withHeader(String[]) because
> the
> > >> names in the Set must match the names in the header record.
> >
> > > >
> > > > What you are talking about sounds more like a view or a projection of
> > the
> > > > actual content being parsed.
> > > > Do we really need this for 1.0 or can it be postponed?
> > >
> > > This is a real scenario and a real need, not some imaginary
> complication
> > ;)
> > >
> > > Even if it is not implemented for 1.0, we should talk about how it
> > > should be done such that it fits in and does not cause API problems
> > > later. And if I can get it done by then, then that much the better.
> > >
> >
> > Okay, then let's discuss this on a new thread :-)
> >
> > As I've said, I think we should not push to much into
> > withHeaders(String...). Maybe this is some sort of view, where you can
> pass
> > a parser and the headers you are interested in and it will return an
> > Iterable<CSVRecord> (or CSVParser) that just gives access to the
> specified
> > headers you are interessted in?
> >
> > Would it be possible to give a code example of what you have to do with
> to
> > current API in your use case and what you want?
> >
>
> I am switching to withHeader() with no arg (same as a new String[]{}) and
> let the parser guess the headers and then pray that the names match between
> the app and the files. Which is just as unsafe as forcing the headers in
> fixed order on the parser because the column order might have changed.
> Ideally, the column order should not matter, which it does not when you do
> a record.get(String), which is nice.
>
> Calling withHeader() with no args is less brittle than calling it with 90
> args. The benefit is that the column order in the file can change without
> affecting the app, which is good. I could use a little more bullet-proofing
> by making the column names optionally case-insensitive, but that's a
> different feature.
>
> Ideally, I want to define the column names in the app as a simple Java
> enum, then use an enum as a record key. That does not work for column names
> that have spaces in them as mine do, so it's back to classic static final
> Strings as keys. I could create a fancier custom enum but it's not worth it
> for now.
>

Hey Gary,

I still don't understand what you are suggesting. At first I though this
was about accessing a subset of the actual columns (you said your file has
90 columns but you are only interested in ~10).

Your last message sounds more like you're looking for a better way to make
sure the headers parsed from the file match what you are expecting. I guess
this is why getHeaderMap is now public (?!)

What am I missing?

Benedikt


>
> Gary
>
>
> > Benedikt
> >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> Java Persistence with Hibernate, Second Edition<
> http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message