cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
Date Wed, 05 Aug 2015 22:16:07 GMT


Tyler Hobbs commented on CASSANDRA-9302:

bq. Has there been discussion anywhere about implementing a loader on the Java driver, now
that it's bundled with the server?

Yes, there was quite a bit on CASSANDRA-8225.  To summarize, we're making cqlsh "good enough"
for most cases, and planning on using Spark for everything else.

bq. I hope nobody is surprised that the Python implementation is much slower than the C implementation.
\[...\] Hopefully we can amortize this with batching by partition and/or giving it more processes.

If wide partitions are used, I think this will be okay.  We could perhaps take a quick sample
of the file to determine if that's the case, and if not, skip using TAR with murmur3.

> Optimize cqlsh COPY FROM, part 3
> --------------------------------
>                 Key: CASSANDRA-9302
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: David Kua
>             Fix For: 2.1.x
> We've had some discussion moving to Spark CSV import for bulk load in 3.x, but people
need a good bulk load tool now.  One option is to add a separate Java bulk load tool (CASSANDRA-9048),
but if we can match that performance from cqlsh I would prefer to leave COPY FROM as the preferred
option to which we point people, rather than adding more tools that need to be supported indefinitely.
> Previous work on COPY FROM optimization was done in CASSANDRA-7405 and CASSANDRA-8225.

This message was sent by Atlassian JIRA

View raw message