cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiroyuki Nishi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12174) COPY FROM should raise error for non-existing input files
Date Mon, 25 Jul 2016 13:20:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391869#comment-15391869
] 

Hiroyuki Nishi commented on CASSANDRA-12174:
--------------------------------------------

Hi [~Stefania],
Thanks for your response.

I changed the patch as the following.
 https://github.com/yhnishi/cassandra/commit/db75d9dd0d74d3476d500f6b99c22e117dc73ec6


Below is sample results.
Success:
{code}
cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/1.csv';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Processed: 1 rows; Rate:       2 rows/s; Avg. rate:       2 rows/s
1 rows imported from 1 files in 0.420 seconds (0 skipped).


cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/1.csv,/tmp/2.csv';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Processed: 2 rows; Rate:       3 rows/s; Avg. rate:       5 rows/s
2 rows imported from 2 files in 0.418 seconds (0 skipped).


cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/*.csv';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Processed: 2 rows; Rate:       3 rows/s; Avg. rate:       5 rows/s
2 rows imported from 2 files in 0.413 seconds (0 skipped).
{code}

Error:
{code}
cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/1234-doesnotexist';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Failed to import 0 rows: IOError - Can't open '/tmp/1234-doesnotexist' for reading: file does
not exist,  given up after 1 attempts
Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
0 rows imported from 0 files in 0.218 seconds (0 skipped).


cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/*-doesnotexist';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Failed to import 0 rows: IOError - Can't open '/tmp/*-doesnotexist' for reading: file does
not exist,  given up after 1 attempts
Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
0 rows imported from 0 files in 0.218 seconds (0 skipped).


cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/1234-doesnotexist,/tmp/1235-doesnotexist';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Failed to import 0 rows: IOError - Can't open '/tmp/1234-doesnotexist' for reading: file does
not exist,  given up after 1 attempts
Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
0 rows imported from 0 files in 0.217 seconds (0 skipped).


cqlsh> COPY test.airplanes (name, manufacturer, year, mach) FROM '/tmp/1.csv,/tmp/*-doesnotexist';
Using 7 child processes

Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
Failed to import 0 rows: IOError - Can't open '/tmp/*-doesnotexist' for reading: file does
not exist,  given up after 1 attempts
Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
0 rows imported from 1 files in 0.219 seconds (0 skipped).
{code}


Please check the patch once again.

> COPY FROM should raise error for non-existing input files
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-12174
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12174
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Stefan Podkowinski
>            Assignee: Hiroyuki Nishi
>            Priority: Minor
>              Labels: lhf
>         Attachments: CASSANDRA-12174-trunk.patch
>
>
> Currently the CSV COPY FROM command will not raise any error for non-existing paths.
Instead only "0 rows imported" will be shown as result. 
> As the COPY FROM command is often used for tutorials and getting started guides, I'd
suggest to give a clear error message in case of a missing input file. Without such error
it can be confusing for the user to see the command  actually finish, without any clues why
no rows have been imported.
> {noformat}
> CREATE KEYSPACE test
>   WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
> USE test;
> CREATE TABLE airplanes (
>   name text PRIMARY KEY,
>   manufacturer ascii,
>   year int,
>   mach float
> );
> COPY airplanes (name, manufacturer, year, mach) FROM '/tmp/1234-doesnotexist';
> Using 3 child processes
> Starting copy of test.airplanes with columns [name, manufacturer, year, mach].
> Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
> 0 rows imported from 0 files in 0.216 seconds (0 skipped).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message