Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA7F5106F9 for ; Wed, 31 Jul 2013 23:38:30 +0000 (UTC) Received: (qmail 9471 invoked by uid 500); 31 Jul 2013 23:38:30 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 9370 invoked by uid 500); 31 Jul 2013 23:38:30 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 9357 invoked by uid 99); 31 Jul 2013 23:38:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 23:38:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of garydgregory@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 23:38:26 +0000 Received: by mail-bk0-f48.google.com with SMTP id jf20so452032bkc.7 for ; Wed, 31 Jul 2013 16:38:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=vapBfDY4NHb0mBNo6yQiNblqTZJl+04mYkMfDmnSBe4=; b=YC+Erw8t7KSOjW1c5anlvKtxNw2STZ7FJOwkPjanhS6flGB0obXPlveJ6jFx+9V71/ GPrams4Rw2VJmNXygRt3+LZLjvaLCMlXwOQiVHXFTkJ15HnP3Py7m4fEZVMAvsjrpKQn WorH3osXLVbTMpP9fR9wpgr1ds5tqNvkW8KwfoXdwSRn3VR1Su30Km2ml7Nct82vhhxO s8ScVguCPWux2ApUeluymNzVfVT2edY5PlyGyGUpLsuQLsQovN4uHHeC2dHsgzP1wN1U HGOUV5YpuPrgdM51snfAYljcByLGo4frf7EQkDFDe+abQjmz7bQJx4rIis5jFvME6A7S 3ktQ== MIME-Version: 1.0 X-Received: by 10.205.26.193 with SMTP id rn1mr10102508bkb.15.1375313884741; Wed, 31 Jul 2013 16:38:04 -0700 (PDT) Received: by 10.205.6.7 with HTTP; Wed, 31 Jul 2013 16:38:04 -0700 (PDT) Date: Wed, 31 Jul 2013 19:38:04 -0400 Message-ID: Subject: [csv] the plot thinkens: multi-record headers From: Gary Gregory To: Commons Developers List Content-Type: multipart/alternative; boundary=20cf301ee46f8e366b04e2d73782 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301ee46f8e366b04e2d73782 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi All, I found an interesting set of sample CSV files [1] on the IBM Tivoli web site. Each file seems to have a, for lack of a better term, "pre-header". For example, the simplest file [2] looks like this (3 lines): SRM_SaaS_ES,MXASSETInterface,AddChange,EN ASSETNUM,AS_SITEID mesa01,SDASITE A more complex one is [3] (4 lines): SRM_SaaS_ES,MXASSETInterface,AddChange,EN ASSETNUM,ASSETTAG,AUTOWOGEN,BUDGETCOST,CALNUM,CHANGEBY,CHANGEDATE,CHANGEPMS= TATUS,CHILDREN,AS_DESCRIPTION,DISABLED,FAILURECODE,HIERARCHYPATH,INVCOST,IS= LINEAR,ISRUNNING,AS_ITEMNUM,AS_ITEMSETID,MAINTHIERCHY,AS_MANUFACTURER,MOVED= ,AS_ORGID,PRIORITY,PURCHASEPRICE,REMOVEFROMACTIVEROUTES,REMOVEFROMACTIVESP,= REPLACECOST,ROLLTOALLCHILDREN,ROTSUSPACCT,AS_SENDERSYSID,AS_SITEID,AS_STATU= S,AS_STATUSDATE,TOTALCOST,TOTDOWNTIME,TOTUNCHARGEDCOST,UNCHARGEDCOST,USAGE,= VENDOR,YTDCOST,ACTIVE,ASSETMETERID,AVGCALCMETHOD,AM_CHANGEBY,AM_CHANGEDATE,= LIFETODATE,LINEARASSETMETERID,MEASUREUNITID,METERNAME,AM_ORGID,READINGTYPE,= ROLLDOWNSOURCE,SEQUENCE,SINCEINSTALL,SINCELASTINSPEC,SINCELASTOVERHAUL,SINC= ELASTREPAIR cent41,6491,0,0,COMPANY1,MAXADMIN,2010-04-15T17:18:18-07:00,0,0,Centrifugal Pump 100GPM/60FT HD,0,PUMPS,PUMP \ CNTRFGL,0,0,1,PUMP100,SET1,0,IR,0,SDAORG,2,0,0,0,0,0,6600-869-800,MX,SDASIT= E,OPERATING,2010-04-15T17:18:18-07:00,0,0,0,0,,IR,0,1,29,ALL,MAXADMIN,2010-= 04-15T17:19:11-07:00,0,0,HOURS,FLTHRS,SDAORG,DELTA,ASSET,1,0,0,0,0 cent41,6491,0,0,COMPANY1,MAXADMIN,2010-04-15T17:18:18-07:00,0,0,Centrifugal Pump 100GPM/60FT HD,0,PUMPS,PUMP \ CNTRFGL,0,0,1,PUMP100,SET1,0,IR,0,SDAORG,2,0,0,0,0,0,6600-869-800,MX,SDASIT= E,OPERATING,2010-04-15T17:18:18-07:00,0,0,0,0,,IR,0,1,30,,MAXADMIN,2010-04-= 15T17:19:38-07:00,0,0,DEG F,TEMP-F,SDAORG,,,2,0,0,0,0 The first line of both and the other files I checked include: SRM_SaaS_ES,MXASSETInterface,AddChange,EN which is NOT the column names for the data, as far as I can tell. To properly process these files, it looks like we need to either: (1) expand the concept of a header, to include multiple records, specifying which one is the header record for column names, or, (2) add a skipFirstRecords settings. Thoughts? Gary [1] http://pic.dhe.ibm.com/infocenter/tivihelp/v41r1/index.jsp?topic=3D%2Fcom.i= bm.ismsaas.doc%2Fimport%2Fr_sample_csv_files.html [2] http://pic.dhe.ibm.com/infocenter/tivihelp/v41r1/topic/com.ibm.ismsaas.doc/= reference/AssetsImportMinimumSample.csv ) [3] http://pic.dhe.ibm.com/infocenter/tivihelp/v41r1/topic/com.ibm.ismsaas.doc/= reference/AssetsImportExtendedSample.csv --=20 E-Mail: garydgregory@gmail.com | ggregory@apache.org Java Persistence with Hibernate, Second Edition JUnit in Action, Second Edition Spring Batch in Action Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory --20cf301ee46f8e366b04e2d73782--