db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-4555) Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines
Date Mon, 20 Jun 2016 14:01:05 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339523#comment-15339523

Bryan Pendleton commented on DERBY-4555:

Yes, I like the suggested name of SYSCS_IMPORT_TABLE_BULK, too.

Regarding the request to parse columns by header-name, I think we
should look at that item as part of our work here, too.

But first I think we should complete the first phase, of getting
the new procedure name in place, writing the tests, etc.

Then we can take a look at doing the header-name parsing, and see
if we think it is hard or easy. I am not sure I understand the
proposed approach clearly. But there are some details: is the name
matching case-sensitive? Or case-exact? Does the matching care
about white-space? What if the column headers are these new
multi-line sort of headers? what if a column name should contain
a comma, or a space, or some other sort of special character?

I don't think these issues are particularly unsolvable, but I
would like to spend some time ensuring we have a clear and
precise description of the problem we're attempting to solve.

Also, since we're talking about it: so far, we've been working
on providing a new alternate system procedure for SYSCS_IMPORT_TABLE,
but the column-name/header-name issue involves SYSCS_IMPORT_DATA,

I think that, assuming we can work out the final details of
we'll also want to provide a SYSCS_IMPORT_DATA_BULK alternative

So maybe what we have here are several sub-tasks:
1) Create new SYSCS_IMPORT_TABLE_BULK procedure, with extra
   (varargs?) argument at the end to support multi-line headers
2) Create new SYSCS_IMPORT_DATA_BULK procedure, similar to (1)
   procedure, to support recognition of columns by header-name
   as well as by index number
4) Add documentation for the new system procedures

We can use JIRA's sub-task functionality to track these sub-tasks
separately, then we'll mark 4555 as resolved once the entire
set of tasks is complete.

> Expand SYSCS_IMPORT_TABLE to accept CSV file with header lines
> --------------------------------------------------------------
>                 Key: DERBY-4555
>                 URL: https://issues.apache.org/jira/browse/DERBY-4555
>             Project: Derby
>          Issue Type: Improvement
>          Components: Miscellaneous
>            Reporter: Yair Lenga
>            Assignee: Danoja Dias
>         Attachments: NoVarargs.diff, Varargs.diff, addNewSystemProcedure_1.diff, gotException.diff,
hardCoded.diff, latest.diff, noHeaderLines.csv, petlist.csv, petlist.csv, petlist.csv, repro.java,
repro.java, repro.java, skipHeaders.diff
> The SYSCS_IMPORT_TABLE (and SYSCS_IMPORT_DATA) function allow import of data from external
resources. In general, they can process CSV files that created with various tools - with one
exception: the header line.
> While there is no accepted standard, most tools will include a header line in the CSV
file with column names. This convention is supported in Excel and many other tools.
> My Request: extend the SYSCS_IMPORT_TABLe and SYSCS_IMPORT_DATA (and other related procedures)
to include an extra indicator for the number of header lines to be ignored.
> As an extra bonus it will be help is the SYSCS_IMPORT_DATA will accept column names (instead
of column indexes) in the 'COLUMNINDEXES' arguments. E.g., it should be possible to indicate
COLUMNINDEXES of '1,3,sales,5,'. This feature will make it significantly easier to handle
cases where the external input files is extended to include additional columns.

This message was sent by Atlassian JIRA

View raw message