cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13071) cqlsh copy-from should error out when csv contains invalid data for collections
Date Mon, 27 Feb 2017 03:18:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885069#comment-15885069
] 

Stefania commented on CASSANDRA-13071:
--------------------------------------

Thanks for the review!

bq. Is this supposed to be working or are we supposed to use type-based delimiters

It's actually supposed to be working, but I'm not entirely sure about it. I've left a comment
[here|https://github.com/stef1927/cassandra/commit/170e21fa3d8da8661a4cd500c3507d3919717eff#diff-27e394435c04a60c58ec9d5c34397341R1889].
Basically, I wasn't sure about enforcing the correct type parenthesis in order to avoid breaking
data that so far could be imported, albeit data that is incorrect CQL. On the flip side, we
could, for example, incorrectly convert a list to a set and vice-versa, if two columns are
swapped by mistake. Missing columns should be detected by the check on the total number of
columns, so I am not too worried about collections being converted to another collection type
due to missing columns. Perhaps we should just enforce type parenthesis in 4.0?

bq. The dtests results look good, but it seems they were not triggered using the new dtest
branch you created so I re-triggered them using your branch.

I didn't realize we could specify a dtest branch for the cqlsh tests, thanks for relaunching
them. The results are clean for 3.0 and 3.11, but there was a problem for trunk: CASSANDRA-10520
broke the clqshlib tests, and this caused the entire job to fail. I've ninja fixed this, rebased
and relaunched.

> cqlsh copy-from should error out when csv contains invalid data for collections
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13071
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13071
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.0.x, 3.11.x
>
>
> If the csv file contains invalid data for collection types, at the moment the data is
imported incorrectly, an error would be a better behavior.
> For example this table:
> {code}
> CREATE TABLE test.test (key text, value frozen<set<text>>, PRIMARY KEY (key));

> {code}
> with this data:
> {code}
> "key1","{'test1', 'test2'}"
> "Key2","not_a_set"
> {code}
> will be imported by {{COPY test.test FROM 'test.csv';}} without errors but will result
in the following data:
> {code}
> cqlsh> select * from test.test;
>  key  | value
> ------+--------------------
>  key1 | {'test1', 'test2'}
>  Key2 |        {'ot_a_se'}
> (2 rows)
> {code}
> The second row should have been rejected. The reason is that the [{{split}} function|https://github.com/stef1927/cassandra/blob/trunk/pylib/cqlshlib/copyutil.py#L1898]
assumes that the first and last characters of the string passed in are parentheses, without
actually checking it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message