db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-4555) Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines
Date Sat, 23 Jul 2016 15:09:20 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390717#comment-15390717

Bryan Pendleton commented on DERBY-4555:

I experimented with your change for a while, and I think it's
a definite improvement, and we should commit it.

But as I was doing my experiments, I found that the line number
handling in the import code was rather a mess.

For instance, in the case you found, your improvement changes
the error message from:

    [junit] Import error on line 1 of file extinout/pet.dat: Read end of file at unexpected
place on line 1. 


    [junit] Import error on line 1 of file extinout/pet.dat: Read end of file at unexpected
place on line 7. 

which is definitely better. But I was puzzled about that
"Import error on line 1" -- why didn't that change, too?

I discovered that there are multiple objects here, and there
is a separate ImportReadData object used by the readHeaders()
method which is maintaining its own lineNumber.

Similarly, in an earlier test in our test suite, where we do:
        //Invalid number of header lines of the input file causes NULL value error
        cSt = prepareCall(
            " call SYSCS_UTIL.SYSCS_IMPORT_DATA_BULK(null, "
            + "'PET1' , null , '\"Pet Name\",\"Kind of\",\"Age\"' ,   'extinout/pet.dat' "
            + "  , null , null , null, 0, 2) ");

the error message that occurs in this case is:
    [junit] Import error on line 1 of file extinout/pet.dat: Column 'C3'  cannot accept a
NULL value. 

That error message seems inaccurate to me, too, because the
NULL value for column C3 is actually arising on line 3 of the
input file, not on line 1.

I suspect that the problem involves the fact that the Import object and the ImportReadData
object are two separate objects, and the Import object has a "lineNumber" (which is actually
in the ImportAbstract superclass), and the ImportReadData object
has a separate "lineNumber".

It's possible that if ImportAbstract.getCurrentLineNumber()
were changed so that it checked to see if its importReadData
member was non-NULL, and then returned
importReadData.getCurrentRowNumber() rather than simply always
returning its own lineNumber, then these error messages would
report the lineNumber better in more cases.

But that seems to be over-and-above the scope of this issue,
so I'm leaving it alone for now.

> Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines
> --------------------------------------------------------------
>                 Key: DERBY-4555
>                 URL: https://issues.apache.org/jira/browse/DERBY-4555
>             Project: Derby
>          Issue Type: Improvement
>          Components: Miscellaneous
>            Reporter: Yair Lenga
>            Assignee: Danoja Dias
>         Attachments: LineNumberIssue.diff, NoVarargs.diff, Varargs.diff, addNewSystemProcedureWithTest.diff,
addNewSystemProcedureWithTest_1.diff, addNewSystemProcedure_1.diff, gotException.diff, hardCoded.diff,
latest.diff, noHeaderLines.csv, petlist.csv, petlist.csv, petlist.csv, repro.java, repro.java,
repro.java, skipHeaders.diff
> The SYSCS_IMPORT_TABLE (and SYSCS_IMPORT_DATA) function allow import of data from external
resources. In general, they can process CSV files that created with various tools - with one
exception: the header line.
> While there is no accepted standard, most tools will include a header line in the CSV
file with column names. This convention is supported in Excel and many other tools.
> My Request: extend the SYSCS_IMPORT_TABLe and SYSCS_IMPORT_DATA (and other related procedures)
to include an extra indicator for the number of header lines to be ignored.
> As an extra bonus it will be help is the SYSCS_IMPORT_DATA will accept column names (instead
of column indexes) in the 'COLUMNINDEXES' arguments. E.g., it should be possible to indicate
COLUMNINDEXES of '1,3,sales,5,'. This feature will make it significantly easier to handle
cases where the external input files is extended to include additional columns.

This message was sent by Atlassian JIRA

View raw message