cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7405) Optimize cqlsh COPY TO and COPY FROM
Date Mon, 18 Aug 2014 23:29:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101519#comment-14101519
] 

Aleksey Yeschenko commented on CASSANDRA-7405:
----------------------------------------------

Maybe this should be handled separately, in another ticket, but there are a few more things
we could optimize (all import related):

1. If we assume that a significant subset of COPY FROM csv's are going to be results of COPY
TO command, then rows will be grouped by the partition key. In that case we'd win from batching
(until another partition key is met, and constrained by some limit of rows per batch, we don't
want huge batches)
2. Additionally we could switch to prepared statements for writes (assuming that python serialization
cost wouldn't outweigh the server-side benefits). It's a bit involved though, but may be worth
it.

Should also prepare the SELECT, really - it doesn't win us a lot, but it is a trivial change,
so probably worth it.


> Optimize cqlsh COPY TO and COPY FROM
> ------------------------------------
>
>                 Key: CASSANDRA-7405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7405
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Mikhail Stepura
>             Fix For: 2.1.1
>
>         Attachments: CASSANDRA-2.1-7405.patch
>
>
> Now that we are using native proto via python-driver, we can, and should, at the very
least:
> 1. Use proto paging in COPY TO
> 2. Use async writes in COPY FROM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message