Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B54710F63 for ; Wed, 31 Jul 2013 15:03:50 +0000 (UTC) Received: (qmail 98270 invoked by uid 500); 31 Jul 2013 15:03:48 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 98038 invoked by uid 500); 31 Jul 2013 15:03:48 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 98029 invoked by uid 99); 31 Jul 2013 15:03:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 15:03:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of garydgregory@gmail.com designates 209.85.214.54 as permitted sender) Received: from [209.85.214.54] (HELO mail-bk0-f54.google.com) (209.85.214.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 15:03:43 +0000 Received: by mail-bk0-f54.google.com with SMTP id it19so292364bkc.13 for ; Wed, 31 Jul 2013 08:03:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=f3I+Q0p+8otRKWb1el1F6p3KizDt+wL3oW2kmBOvumw=; b=uiA4GwaFzGpnvnSDHzLukUiD4jbjrJPB5VvoC8lkNOt60dllJWNF9AKxJR2daT4YFa 982ADY0H7UdSCQqLEbOCjNSWv9lscLerMOMazf6z96+98Si90v9CqnH7sIZK+7F8CLxe VvicmjWyQ+1UFpX7d/7fS2PIHv+3hMokVVDuHLNg++4gUh8yNtcJutCJshechonIR4KE 4SllKxBpSI5RqE16aBxLhsia9NuVlXQqnKqmNR8UJG968IUYHAzEZFkhBU2FKhvM777E 4Ya8lJid2tRRhJt4xrPeppSlR4Nrc2+/tbnOgI+gEHOebeh+FSEgXceXVtiy3EcxmDck knpQ== MIME-Version: 1.0 X-Received: by 10.205.26.193 with SMTP id rn1mr9917749bkb.15.1375283002020; Wed, 31 Jul 2013 08:03:22 -0700 (PDT) Received: by 10.205.6.7 with HTTP; Wed, 31 Jul 2013 08:03:21 -0700 (PDT) In-Reply-To: References: Date: Wed, 31 Jul 2013 11:03:21 -0400 Message-ID: Subject: Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record) From: Gary Gregory To: Commons Developers List Content-Type: multipart/alternative; boundary=20cf301ee46fcd496404e2d00637 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301ee46fcd496404e2d00637 Content-Type: text/plain; charset=UTF-8 On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter wrote: > > > >> A use case I have now is a CSV file with a lot of columns (~90) but I > only > >> care about a small subset of the columns (~10). I'd like to be able to > say > >> withHeader(Set) where the Set may be a subset of the actual column names > in > >> the header line. This is different from withHeader(String[]) because the > >> names in the Set must match the names in the header record. > > > > > > > What you are talking about sounds more like a view or a projection of > the > > > actual content being parsed. > > > Do we really need this for 1.0 or can it be postponed? > > > > This is a real scenario and a real need, not some imaginary complication > ;) > > > > Even if it is not implemented for 1.0, we should talk about how it > > should be done such that it fits in and does not cause API problems > > later. And if I can get it done by then, then that much the better. > > > > Okay, then let's discuss this on a new thread :-) > > As I've said, I think we should not push to much into > withHeaders(String...). Maybe this is some sort of view, where you can pass > a parser and the headers you are interested in and it will return an > Iterable (or CSVParser) that just gives access to the specified > headers you are interessted in? > > Would it be possible to give a code example of what you have to do with to > current API in your use case and what you want? > I am switching to withHeader() with no arg (same as a new String[]{}) and let the parser guess the headers and then pray that the names match between the app and the files. Which is just as unsafe as forcing the headers in fixed order on the parser because the column order might have changed. Ideally, the column order should not matter, which it does not when you do a record.get(String), which is nice. Calling withHeader() with no args is less brittle than calling it with 90 args. The benefit is that the column order in the file can change without affecting the app, which is good. I could use a little more bullet-proofing by making the column names optionally case-insensitive, but that's a different feature. Ideally, I want to define the column names in the app as a simple Java enum, then use an enum as a record key. That does not work for column names that have spaces in them as mine do, so it's back to classic static final Strings as keys. I could create a fancier custom enum but it's not worth it for now. Gary > Benedikt > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- E-Mail: garydgregory@gmail.com | ggregory@apache.org Java Persistence with Hibernate, Second Edition JUnit in Action, Second Edition Spring Batch in Action Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory --20cf301ee46fcd496404e2d00637--