cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-14391) COPY FROM should read columns from file header
Date Thu, 26 Apr 2018 02:06:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stefania updated CASSANDRA-14391:
---------------------------------
    Summary: COPY FROM should read columns from file header  (was: COPY FROM ignores headers)

> COPY FROM should read columns from file header
> ----------------------------------------------
>
>                 Key: CASSANDRA-14391
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14391
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>         Environment: cqlsh 5.0.1 and Cassandra 3.11.2 on macOS 10.13.2.
>            Reporter: M. Justin
>            Priority: Major
>
> COPY FROM appears to ignore the headers value, even when "headers = true" is specified.
This means that if the columns are reordered, the import process will save values in the wrong
columns.  Additionally, if there are missing columns, an error occurs, even if those columns
are not primary key columns.
> This behavior contradicts the behavior [specified in the docs|https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html]
(emphasis mine).
> {quote}COPY FROM imports data from a CSV file into an existing table. Each line in the
source file is imported as a row. All rows in the dataset must contain the same number of
fields and have values in the PRIMARY KEY fields. The process verifies the PRIMARY KEY and
updates existing records. If {{HEADER = false}} and no column names are specified, the fields
are imported in deterministic order. *When column names are specified, fields are imported
in that order. Missing and empty fields are set to null. The source cannot have more fields
than the target table, however it can have fewer fields.*
> {quote}
> h2. Example
> {noformat:title=temp.csv}
> col2,col1,col3
> column value 1,key2,3
> column value 2,key4,3
> column value 3,key3,3
> column value 4,key1,3
> {noformat}
> {code:sql}
> create keyspace copy_to_from_test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor'
: 1 };
> use copy_to_from_test;
> create table test_table (col1 text primary key, col2 text, col3 bigint);
> copy test_table from 'temp.csv' with header = true;
> {code}
> The above code will incorrectly swap the "col2" and "col1" values, since it expects the
first column to be "col1". If I had instead swapped the order of "col3", I would have received
an error on input, as it would have attempted to store text in a numerical column.
> h2.  Expected Behavior
> I would expect specifying "with header = true" on a COPY FROM statement to use the headers
as column names for insertion, rather than merely skipping the header row.  Missing non-primary
key columns should be set to null.
> h2. Other
> I ran across this issue when copying between two of my environments. One of the environments
had changed the columns in the primary key, but the other had not yet. This caused the order
of the columns to vary between the environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message