cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Holmberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
Date Mon, 14 Dec 2015 17:57:46 GMT


Adam Holmberg commented on CASSANDRA-9302:

bq. DCAware policy fixed it, now we have only one session.
Now that we're not choosing session based on replica host, we might further simplify {{split_batches}}
to just group by partition key (i.e., no need for {{get_replica}}). Alternatively, if you
want to send to a specific host other than one that load balancing would choose, we would
need to borrow a connection and send directly on that (I don't think that's worth doing).

I find it a little awkward that numeric option values require quoting:
cassandra@cqlsh> COPY test.t FROM 'f.csv' WITH HEADER = false AND REPORTFREQUENCY = 100;
Improper COPY command.
cassandra@cqlsh> COPY test.t FROM 'f.csv' WITH HEADER = false AND REPORTFREQUENCY = '100';

Starting copy of test.t...
Is that a hard thing to change?

> Optimize cqlsh COPY FROM, part 3
> --------------------------------
>                 Key: CASSANDRA-9302
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
> We've had some discussion moving to Spark CSV import for bulk load in 3.x, but people
need a good bulk load tool now.  One option is to add a separate Java bulk load tool (CASSANDRA-9048),
but if we can match that performance from cqlsh I would prefer to leave COPY FROM as the preferred
option to which we point people, rather than adding more tools that need to be supported indefinitely.
> Previous work on COPY FROM optimization was done in CASSANDRA-7405 and CASSANDRA-8225.

This message was sent by Atlassian JIRA

View raw message